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The authors of this book are university professors in finance with many years of 
experience in research and teaching. During these years, not only one but several 
radical changes could be observed. During the period between 1960 and 1980, one 
such thrust was that the research in the finance area was primarily characterized 
by theoretical- and model-based analysis. Another thrust were the modern markets 
such as the Chicago Board of Trade, which contributed to a significant increase in 
the interest in the results of these theoretical research efforts. Also, the enormous 
availability of data led to an extensive growth of empirical research in the field 
with theoretical investigations taking a backseat. Furthermore, the interdisciplinary 
orientation ensured that mathematicians also got enthusiastic about the subject 
area and indeed strengthened the field. While this short overview of the scientific 
development is undoubtedly incomplete, the concept of the Brownian motion always 
played an important role. 

There is no shortage of books providing a mathematical precise introduction of 
this important concept. Similarly, the great deal of empirical research efforts have 
analyzed the Brownian model. Furthermore, there exists an extensive literature for 
practitioners looking for a first introduction to the Brownian motion. However, it is 
our impression that in the rapid development, an important aspect of the Brownian 
motion was lost. In particular, the relationship between mathematics and economics 
did not receive the same level of attention. For our own courses, we were looking for 
a textbook which could explain interested students in economics and finance what 
mathematicians understand by a Brownian motion without these students having to 
struggle with the deeper secrets of mathematics. We could not find such a book. 
Therefore, we decided to write it ourselves. The reader is holding it in her hands. 


Berlin, Germany Andreas Löffler 
Berlin, Germany Lutz Kruschwitz 
March 2019 


There are many people to whom we owe thanks. Especially, Uwe Dulleck, Deborah 
Gelernter, Matthias Lang, Roberto Liebscher, and Bernhard Nietert helped us with 
many critical remarks. The discussions with our longtime companion Dominica 
Canefield encouraged us to complete this project. Special thanks go to our friend 
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It is a mistake to think about a mathematical model as if it were 
the reality. In the physical sciences, where the model often fits 
reality very well, this may be a convenient way of thinking that 
causes little harm. But in the social sciences, models are often 
little better than caricatures. 


Tan Stewart 
In Pursuit of the Unknown (page 127) 


1.1 Stochastics in Finance Theory 


Anyone who is occupied with modern financing theory will soon come across terms 
such as Brownian motion,! random processes, measure, and Lebesgue integral.” 
Based on the many years of experience we have gained in university teaching, 
we claim that some readers do not have sufficient knowledge in this field, unless 
they have studied mathematics. Therefore, they may not know what is meant by 
probability measures, Brownian motions, and similar terms. 


Various Random Processes Time series of share prices generally look very 
different from price developments of bonds which can be explained (among other 
reasons) by the fact that bonds—in contrast to equities—have a limited term. As 
the remaining time to maturity becomes shorter, bond prices always approach their 
nominal value,’ while with stocks it is extremely rarely observed that their prices 
to stabilize, as shown in Fig. 1.1. The development of the base interest rate of the 
European Central Bank in the period between 2009 and 2015 gives a different 


'Robert Brown (1773-1858, British botanist). 
2Henri Léon Lebesgue (1875-1941, French mathematician). 
3We talk about the “Pull-to-par” phenomenon. 
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Fig. 1.1 Conceivable share price development 


Fig. 1.2 Development of the ECB’s base interest rate from 2009 to 2015. Source: www.finanzen. 
net/leitzins/@historisch 


picture in every respect (see Fig. 1.2). In both cases, however, we are dealing with 
processes that would undoubtedly be described as random. While the first process 
seems to be in constant motion, the second process remains stable over longer 
periods of time and jumps up or down at irregular intervals the extent of which 
seems unpredictable. 


If one now wants to do justice to the developments shown in these illus- 
trations with the help of mathematical random processes, one has to resort to 
different models. The theory of random processes provides a comprehensive set of 
instruments. Mathematicians speak of stochastic processes and distinguish between 
Markov, Gauss, and Feller processes, each with several variants. Brownian motions, 
belonging to the class of Gaussian processes, are particularly prominent in the 
literature on finance theory.“ 


4Carl Friedrich Gauß (1777-1855), German mathematician. 
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Alternatives in Dealing with a New Scientific Terrain If you want to enter a 
previously unknown field of knowledge, you inevitably will be confronted with 
terms and contexts you have never been exposed to before. There are various 
possibilities to cope with the situation. Two typical options are as follows: 


A thorough method is to put aside the text that is currently of interest and search 
for special sources dealing with the previously unknown terms and concepts. This 
can be very time-consuming and students of economics in particular cannot or do 
not always want to afford this approach. 

Alternatively one can continue studying the material in the hope to gain some sort 
of intuitive understanding of the new terms and concepts. This approach is inevitably 
superficial. Nevertheless, it may be adequate if the authors are experienced textbook- 
writers. However, they usually do not provide sufficient details. After all, one wants 
to keep the reader in line and not expect him to specialize in a peripheral field. The 
latter approach also has its shortcomings. 


1.2 Precision and Intuition in the Valuation of Derivatives 


At this point we want to give our readers a first glimpse of how careful you have to 
be if you want to be logically consistent with Brownian motions in finance theory. 


dt and At To this end, we start with a discrete model that describes the 
development of a share price. We look at any point in time ¢ and ask how we could 
describe the change of the share price after the period At > 0. For example, we can 
imagine Ar being a day. If we call the change of the current share price AS, this 
amount could be modeled by 


AS=uSAt+oSAz, (1.1) 


where S is the current share price. The parameters u and ø should be any positive 
numbers at first.” At is—as already mentioned—the change in time, i.e., 1 day. The 
variable Az not yet explained should be the change of a random number during 
the time interval At. For example, you could imagine a coin being flipped at the 
end of each day: Az will be +2% if heads appear and —1% otherwise. None of 
the variables on the right side of Eq.(1.1) is especially “exciting” and therefore 
does not require much attention. It should be emphasized, however, that it would be 
entirely unproblematic to divide the equation by Ar, because mathematically Ar is 
a real number. With objects such as the real numbers you can perform many other 
mathematical operations without having to be particularly careful. For real numbers 
certain axioms apply which the mathematical layperson usually is not aware of. But 


5We could make the coefficients u and o time-dependent which would not change anything 
decisive in our remarks. 
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it follows from the axioms that these objects can be used to perform operations 
known as addition, subtraction, multiplication, and division even mathematical 
laypersons are quite familiar with. 


However, all this changes as soon as we turn to a continuous-time model. If 
we call dt a change in time approaching zero, and if dz describes the change in a 
random number within such a vanishing interval, and finally if dS is to reflect the 
change of the share price, then it is obvious to express dS as 


dS =uSdt+0 Sdz. (1.2) 


Of course, we can realize that dr will never be exactly zero, otherwise time would 
come to a standstill. But what should we imagine when it comes to changing a 
random variable within a vanishingly small interval of time? Such a change (i.e., 
dz) can be small, but it could also be relatively large or even disappear entirely if 
chance would have it. Under no circumstances should this dz be ignored. 

Let us now focus on the object dt. We have stated above that it is of infinites- 
imally small size. Which mathematical operations may be performed with it? The 
layperson can hardly imagine that a real number Ar could lose the property of being 
a real number simply because it gets smaller and smaller and is therefore called dt. 
However, if the above property was true Eq. (1.2) might not simply be divided by dt. 
And in fact, dt is not a real number. 


A First Encounter with Wiener®-Processes We will show what problems can 
arise if Eq. (1.2) is treated superficially. To this end, we first write (1.2) in a slightly 
different form 


dS =wSdt+oSdw (1.3) 


with dW taking the role of dz. dW is a very special random process known as 
Wiener process or Brownian motion. If you want to learn a little more about 


Therefore, an expression of type )-7°, Ar also makes sense. And if Ar > 0 is valid the sum is 
infinite because the continued addition of positive real numbers (regardless of their amount) leads 
to an infinitely large positive value. We will return to this expression in the next footnote. 


7 A mathematical layperson can, for example, realize this by trying to evaluate the computation rule 
X7; dt. Does the expression go towards zero because the objects dr are infinitely small? Or does 
it go towards infinity because you add infinitely many of these objects? The solution is simpler than 
the layperson might assume. It comes down to the fact that the question was pointless, because the 
dt are simply not real numbers. The operation for which the result is asked is purely not allowed. 
This expression is as pointless as x% or a: 
$The term “Wiener process” presumably does not go back to Norbert Wiener (see footnote 23 on 
page 48), but to the German mathematician and physicist Christian Wiener (1826-1896). He could 
prove in 1863 that Brownian motion is a consequence of the molecular movements of the liquid by 
disproving the biological causes Brown himself suspected. 
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this particular random process and restrict yourself to reading standard financial 
textbooks, you will learn that dW is a constantly evolving process for which 


dW=evVdt withe ~ N(0, 1) (1.4) 


applies.” This expresses that the change of the random variable during the infinitesi- 
mal small time interval dr results from the product of a standard normally distributed 
random number £ and y dt. 


Value of a Derivative With the continued study of financial textbooks the change 
in the value of financial titles, depending on the development of a share price, is 
described by the so-called Its lemma.!° A value of a derivative f (S) depending on 
the share price necessarily follows the stochastic process!! 


of af l2 f 2) af 
df = | uS + — + =S°— 0") dt + —0SdW. 1.5 
(Ger tu 2° age er nn 


While the reader may not be concerned with the development of (1.5), he may, 
however, be interested in its practical application. 

Looking at Eqs. (1.3) and (1.5) from this perspective, one can see that the change 
in the stock price (dS) as well as the change in the value of the derivative (df) 
depend on the variables time (dt) and randomness (dW). If you now form a hedge 
portfolio by buying ir units of shares and selling one unit of the derivative, the 
random influences compensate each other and you actually hold a risk-free portfolio. 
If one proceeds this way, one can find a so-called fundamental equation!? for each 
derivative from which the risk is entirely eliminated. 


Itö-Lemma and Taylor Series There may be readers who want to understand 
the relations more precisely. Such readers do not merely take note of the Itö 
equation (1.5), but would like to be shown that this equation is correct. Then you 
have to get into the mathematical literature that is difficult to comprehend for readers 
having only an economic background. In the financial literature, however, we also 
like to show ways to understand the Itö lemma in an intuitive way.'? This usually 


°Here, once again, there is a certain carelessness in dealing with the infinitesimally small size. If 
you want to extract the root from a number, it must not be negative. Therefore, dt > 0 must apply. 
Of course the question arises why this relation should be fulfilled. 


1015 Kiyoshi (1915-2008, Japanese mathematician). 
1 For a European call option the payout function is f(-) depending on the share price, for example 
at an exercise price of K 


f(x) = max(x — K, 0). 


One also speaks of the Black-Scholes equation. 
13For example, see Copeland et al. (2005, p. 964 f.). 
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happens in such a way that a function f (S + AS, t + Ar) will be approximated at 
f (S, t) with the help of a Taylor series.'* The result of such an exercise is 


afa (Fu +4 a o?) artos aw. (1.6) 
The reader will easily realize that Eqs. (1.6) and (1.5) are not identical because a 
Taylor series usually ends with an approximation error. However, if the approxi- 
mation formula (1.6) correctly describes the performance of a derivative, then the 
hedge portfolio would not really be risk-free at all, but at best approximately risk- 
free without knowing anything about the size of the approximation error. If this 
portfolio were now to yield risk-free interest an arbitrage opportunity could exist, 
which would nullify the decisive economic argument for deriving the Black—Scholes 
equation. The allegedly plausible derivation of the Black-Scholes equation is 
therefore anything but unproblematic. 


1.3 Purpose of the Book 


We want to give a reader, interested in questions of finance theory who has neither 
the time nor the interest to attend a complete mathematics course, an understandable 
introduction to the stochastic integration calculus or Brownian motion, which is 
correct (or at least acceptable) from a mathematician’s perspective. 

Many textbook authors make it too easy to deal with the Brownian motion 
through intuitive approaches.!° Economic intuition may be important, but it cannot 
replace the engagement with mathematical formalism. Worse, pure intuition can 
even be economically flawed, as we have just shown. 

Our approach is a tightrope walk. We want to present the Brownian motion as 
precise as possible without overtaxing the reader with the methodology used in 
mathematics. If mathematicians deal with certain problems in one way or another, 
there are always good reasons for doing so which can also be explained vividly. 

Our approach is not free of problems. We cannot and will not provide a 
mathematically precise text because such monographs already exist.'° We do not 
concentrate on mathematical precision nor will we deliver extensive mathematical 
proofs. Instead we will present substantiated reasons why certain concepts must be 
defined or derived in this way and not in any other way. Of course, what we accept 
as factually justified is always subjective; and in this respect this text is also an 
experiment. In any case, we believe that there is no comparable book on the market 
for this type of presentation. 


'4Brook Taylor (1685-1731, British mathematician). 
'5Tp addition, what intuition means in scientific discourse is not at all clear, see Kruschwitz et al. 
(2010, p. 370 ff). 


'6See for example Karatzas and Shreve (1991), Huang (1989), Harrison (1990), Revuz and Yor 
(1999), Musiela and Rutkowski (2005). 
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When writing their scientific texts, economists want readers to understand why 
certain assumptions and definitions are formulated in this way and not differently. 
If one looks at texts written by mathematicians, on the other hand, corresponding 
efforts are usually lacking. It is often hard to understand why complex issues are 
developed in exactly this way and not in any other way. Our book deals with 
mathematical problems of interest to economists. Therefore, we want to try to 
increase the readability of our explanations for this target group by explaining why 
mathematicians often use quite complicated ways to arrive at certain results. For 
example, it is not immediately obvious why one has to deal with o-algebras in 
order to be able to define the concept of measure reasonably. Nor is it possible 
to understand without further explanation why the point-by-point convergence of 
functions is not a particularly suitable candidate for the concept of convergence. 
In this book we want to present important issues in such a way that they can be 
understood by readers who are not immediately familiar with the subject. 

We will briefly address several ideas which deserve a thorough examination. 


Two Notations for a Brownian Motion We will begin with a statement that may 
surprise economists: Eq. (1.7) is nothing else but another representation of Eq. (1.3) 


t t 


so-s0= f us6ds+| o S(s)dW(s). (1.7) 
0 0 


Equations (1.3) and (1.7) are expressing just the same. Mathematicians like to speak 
of stochastic differential equations or also of stochastic integral equations in this 
context.!7 Let it be clear that “H20,” “dihydrogenium oxide,” and “water” are one 
and the same. However, when writing down chemical formulas, there are certain 
rules that prescribe how to deal with the chemical elements named H and O. Thus, 
“H” stands for a hydrogen atom, while “O” denotes an oxygen atom. The low-set 
number > also has a certain meaning. And it is not irrelevant whether this number is 
attached to the hydrogen atom or to the oxygen atom. However, we do not want to 
strain the comparison with chemical formulas here.!® 


'7We will go into more detail on page 9. 
'8Our readers may know similar things from the field of mathematics. So you can either write 


fi) =a 
or 
. f(x th) — f(x) 
lim — =a 
h>0 h 
or 
dfx) _ 
=4 
dx 


It is always the same. But anyone who believes that the mathematically (mark you) perfectly correct 
equation 


df (x) =adx 
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We now return to the equivalence of Eqs. (1.3) and (1.7). Usually economists are 
not exposed to the form of (1.7). And that is precisely the reason why it is worth 
taking a closer look at this equation. 


The Symbol dW(s) The terms dW(s) and dS are not objects with which you can 
easily carry out transformations. The “differential” dW (s) is not defined as you 
define a derivative, a limit, or an integral. This expression is found in stochastic 
analysis exclusively in connection with equations of the form (1.3) or (what is 
the same) equations of type (1.7). If we want to make another comparison with 
chemical formulas, the low-set number 2 can prove helpful. This number only 
appears in chemical formulas and it will never be placed as the very first sign in 
such a representation. The reason is that the low-set number is always preceded by 
the chemical element in the molecule (representing the quantum of atoms). Without 
any chemical element the expression like 2 does not make any sense. Similarly, 
dW (s) is inextricably linked to a stochastic integral (1.7). 


A Known Integral What mathematical statement can be made of a stochastic 
differential equation in the form of (1.7)? To this end we will take a closer look 
at the two integrals on the right side of this equation. First we recognize the term 


t 
I patid: (1.8) 
0 


This is a definite integral.'” So if u S(s) is a “normal” function, this integral 
describes the area under the function within the limits of the [0, r] interval. In 
Fig. 1.3 we give a schematic representation for this integral. For a mathematician, 
this raises a host of other questions.” In the context of a conventional education 
in economics, these questions are dealt with shallowness such that the student may 
feel sufficiently safe to analyze economic problems adequately. 


A Strange Integral It is much more complicated with the second term in Eq. (1.7) 


t 
Í o S(s)dW(s). (1.9) 
0 


can be obtained by simply multiplying the last equation by dx is wrong. It, too, is only another 
spelling of the identities mentioned, the so-called differentials. Someone who succumbs to such 
errors is also not immune from making serious mistakes when dealing with stochastic differential 
equations. 

19We will talk about a Riemann integral later, see page 71. 


20Examples are the following: under what conditions does this integral exist? Is the integral over a 
sum equal to the sum of the individual integrals? Can any continuous function be integrated? 
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Fig. 1.3 The integral h u S(s)ds as the colored area under the function u S(s) in the interval 
[0, 7] 


This expression looks like a definite integral, but we will immediately understand 
that it can no longer be interpreted by the area under a function as shown in Fig. 1.3. 


In Fig. 1.3 we find the time s on the abscissa. This makes sense because s is a 
variable that can assume any value from zero to infinity. o S(s) is also a function 
that assigns a numerical value o S(s) to the time s between zero and infinity. It 
has to be emphasized that the function will not be integrated over time s! Instead, 
the integration now takes place, as it is formally called, “over a Brownian motion 
W (s)? For a non-mathematician this type of integration probably remains a great 
mystery. 

An integration over a Brownian motion could only be understood as shown in 
Fig. 1.3 if the object W(s) should be treated as a real number. Real numbers have 
the property that they can be arranged in ascending or descending order. If you 
look at the real numbers, you can use a real line. In Fig. 1.3 this real line plays an 
important role because it corresponds to the abscissa. 

The Brownian motion W(s) is anything but a real number. Rather, it is a very 
large—even infinitely large—set of continuous functions that can be represented 
graphically as (time-dependent) paths. To understand this in more detail, look at 
Fig. 1.4 which illustrates the development of Brownian paths. In the figure you see 
two possible paths. In order to establish the analogy to the classical integral, these 
paths had to be arranged on a real line. We would have to clarify which of the two 
paths is further to the left or further to the right. Obviously, this is not possible. 
Brownian paths simply cannot be arranged one after the other on a real line. There 
is also no “smallest Brownian motion,” which could correspond to zero. It remains 
absolutely mysterious how one could illustrate the “abscissa” of a stochastic integral 
of the form (1.3) analogous to Fig. 1.3. We will address this mystery in this book. 

As indicated on page 7 we will now address the terms “stochastic differential 
equation” and “stochastic integral equation.” Equation (1.3) is called differential 
equation because it contains the term dW, while Eq. (1.7) is a stochastic integral 
equation. The statement that Eqs.(1.3) and (1.7) are equivalent in content must 
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AOSHI 


Fig. 1.4 Two realizations of a Brownian motion 


irritate a non-mathematician, because it is difficult to accept that a differential 
equation is the same as an integral equation. But the irritation goes even further 
if one looks at the object dW and interprets it as the “differential of the Brownian 
motion.” But what should be the differential of Brownian motion? As will be shown 
later a Brownian motion is an infinitely large set of continuous functions which can 
rarely be differentiated at any point.*! The fact that equations such as (1.3) persist 
in the literature, although important terms are actually “mathematically absurd,” can 
only be explained from the history of this theory. Often these equations were created 
by physicists and not by mathematicians. Although physicists usually manage to 
avoid fundamental mathematical errors, their crude procedures are frequently put on 
a solid mathematical foundation in later years. If they finally succeed the “wrong” 
spelling established long time ago will not be excluded from the everyday life of 
physics.”? 

Readers interested in the historical backgrounds of the Brownian motion are 
invited to refer to the Figs. 1.5 and 1.6. 


21 See page 95. 

22A famous example is the distribution theory from physics. Before it could be represented 
mathematically error-free with the help of the Schwartz spaces, the calculations of the users (above 
all Oliver Heaviside) were notorious for their carelessness in formalism. Dirac wrote: “It seemed 
to me that when you’re confident that a certain method gives the right answer, you didn’t have to 
bother about rigour.” Quoted from Peters (2004, p. 106). 
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5.. Uber die von der molekularkinetischen Theorie 
der Wärme geforderte Bewegung von in ruhenden 
Flüssigkeiten suspendierten Teilchen; 
von A. Einstein. 


In dieser Arbeit soll gezeigt werden, daß nach der molekular- 
kinetischen Theorie der Wärme in Flüssigkeiten suspendierte 
Körper von mikroskopisch sichtbarer Größe infolge der Mole- 
kularbewegung der Wärme Bewegungen von solcher Größe 
ausführen müssen, daß diese Bewegungen leicht mit dem 
Mikroskop nachgewiesen werden können. Es ist möglich, daß 
die hier zu behandelnden Bewegungen mit der sogenannten 
„Brownschen Molekularbewegung“ identisch sind; die mir 
erreichbaren Angaben über letztere sind jedoch so ungenau, 
daß ich mir hierüber kein Urteil bilden konnte. 


Fig. 1.5 It was Albert Einstein (1878-1959), who was the first to publish a physical theory for the 
Brownian motion in 1905. An earlier piece of work by Louis Bachelier (1870-1946) from the year 
1900, in which Brownian motions were applied to financial markets, remained entirely unnoticed 
for a long time 
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THE observations, of which it is my object to give a sum- 

mary in the following pages, have all been made with a 
simple microscope, and indeed with one and the same lens, 
the focal length of which is about „nd of an inch*. 

The examination of the unimpregnated vegetable Ovulum, 
an account of which was published early in 1826+, led me to 
attend more minutely than I had before done to the structure 
of the Pollen, and to inquire into its mode of action on the 
Pistillum in Phenogamous plants. 

In the Essay referred-to, it was shown that the apex of the 


* This double convex Lens, which has been several years in my pos- 
session, I obtained from Mr. Bancks, optician, in the Strand. After I 
had made considerable j ress in the inquiry, I explained the nature of 
my subject to Mr. Dollond, who obligingly made for me a simple pocket 
microscope, having very delicate adjustment, and furnished with. ex- 
cellent lenses, two of which are of much higher power than that above 
mentioned. To these I have often had recourse, and with great advantage, 
in investigating several minute points. But to give greater consistency to 
my statements, and to bring the subject as much as possible within the reach 
ofi general observation, I continued to employ throughout the whole of the 
inquiry the same lens with which it was commenced. 

I the Botanical Appendix to Captain King’s Voyages to Australia, 
vol. li. p. 534. et seq. 
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Fig. 1.6 Facsimile of the original article by Brown (1828). It contains neither a drawing nor a 


formula 
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We will present the most important elements of set theory, because without 
appropriate knowledge one cannot acquire a sufficient understanding of Brownian 
motion. Set theory is also needed when it comes to the theory of random variables, 
probability theory, information economics, or game theory. Since set theory is not 
dealt with in sufficient detail in formal training of economists, we will discuss the 
required issues here. 


2.1 Notation and Set Operations 

Term of Set A set is a collection of various objects. If you want to describe a set 
you have to specify its elements. This happens in curly brackets where the elements 
are shown following a colon or a vertical line. The set 


M := {x € R | ax + bx? > 0} (2.1) 


contains those real numbers x that satisfy the inequality ax + bx? > 0. If a set 
contains only a single element, it is called a point set. 


If the numbers are real, one likes to use abbreviations in notation. The set A of 
numbers greater than 0 and less than 1 should actually be written in the form 


A:={xER]|O0<x<1} (2.2) 
which is quite cumbersome. Instead, the more compact notation 


A := (0, 1) (2.3) 
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is used. This applies also to half-open and closed intervals. 

The set of real numbers is denoted by R, the set of natural numbers by N, the set 
of integers by Z, and the set of rational numbers by Q.! The empty set Ø contains 
no elements. 

The elements are grouped together in a set, regardless of their sequence. So if 
Q consists of the two elements u and d, it does not matter whether Q = {u, d} or 
Q = {d,u} is written. In some economic applications, however, the order of the 
elements is important. One speaks then of a “pair” and writes (u, d) if the sequence 
of elements is relevant. There can also be more than two entries such as (u, d, d, u). 
In this case one speaks of a “tuple.” If these pairs or tuples are combined into a set, 
this set is no longer Q but a new set. Depending on the length of the tuple this new 
set is called Q? for two entries in a pair and Q7 for T entries in the tuple. 


Set Operations You can unite sets and you can calculate their intersection and 
difference. Considering two sets this means the following. 

The union contains all their elements. The symbol U is used to identify the union. 
For example, the following applies 


{1,2} = {1} U {2}. 


The intersection contains elements found in both sets. The symbol N indicates 
intersection. For example, the following is true 


Ø = {1} AN {2}. 


Sets whose intersection is empty are called disjoint. 

Let us focus on a set A. We denote this a subset of B if it only contains elements 
from B regardless whether these are all or only some elements. It is in short A C B. 
Then the following always applies 


ACB = AUB=B and AnB=A. (2.4) 


Each of the last two relationships characterizes subsets. 

The difference A\B of two sets is the set containing all elements from A 
that are not in B. For the difference rules of calculation apply which are similar 
to arithmetics.” When A and B are subsets of a set C, then the following 


'While there exist different opinions about whether zero is a natural number, we assume it is. 
Rational numbers are regularly defined as quotients of integers x = 7, where m,n € Z and n #0. 
?For the sake of completeness, it should be noted that the representation Q @ Q = Q? is used for 
pair formation of two different sets. The mathematically trained reader then knows that the new set 
consists of the ordered pairs (u, d) where u € Q and d € Q. 

3The rules are not always similar. A simple rule says that if there is a \ difference, the symbols N 
and U are swapped. 
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Fig. 2.1 Intersection of sets 
A and B 


Fig. 2.2 The difference 
between A\BNC (left) and 
A\(BN C) (right) 


is valid: 
C\(AN B) = (C\A) U(C\B) (2.5) 
C\(A U B) = (C\A) N(C\B) (2.6) 
C\(C\A) =A. (2.7) 


One can convince oneself of the correctness of these rules with the help of the so- 
called Venn diagrams*; we are using them here without further explanation. A Venn 
diagram is a simple symbolic representation in which the sets are always indicated 
by circles or ellipses. The drawing illustrates the statements on union, intersection, 
and difference, see Fig. 2.1. 

The graph shows, for example, that the set AN B (the inner part of both ellipses) 
represents a subset of A U B (the totality of both ellipses), thus AN B C AU B. 


Computation Rules The following rule applies to set operations and is based on 
the rules of arithmetic: line operation takes precedence over union and intersection. 
As a result, brackets that include a difference can be omitted. For example, one 
writes 


(A\B)NC shorter A\BNC. (2.8) 
We know that 1/2 + 3 is something other than 1/(2 + 3). A similar case can be 
found in Fig. 2.2. There are two terms that differ only in the brackets: A\BNC and 


the set A\(B N C). The second set differs from the first in a small but not negligible 
part: it contains those elements of A which are not in C. 


*John Venn (1834-1923, British mathematician). 
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If the difference B\A is formed, where B = Q represents the initial set of all 
elements, the result is also called complement of A and one simply writes A“. 

Often set operations are identified with corresponding calculation rules from 
arithmetic: the difference “looks like” subtraction, while the union reminds one 
of addition. Note, however, that there are some calculation rules that are in clear 
contrast to arithmetic. For example, the following applies to all A: 


ANA=A and AUA=A. (2.9) 
There is no equivalent in the arithmetic of real numbers. 


An Exercise We illustrate the calculation rules described using two equations. For 
this purpose, we represent a union A U B of two sets by two new sets such that the 
two new sets are disjoint. This is relatively easy because 


AUB=(A\B) UB (2.10) 


must be fulfilled. Let us first realize that the union on the right is indeed identical to 
AU B; second, these two new sets are disjoint. The second condition is obviously 
met, because A\B by definition only contains elements that are not included in B, 
i.e., 


A\BNB=4. (2.11) 


If the union of the two sets is to be determined precisely, the procedure would be 
as follows: 
A\BUB = {x|x € A\B or x € B} 
= {x| (x € A and x ¢ B) orx € B} 
= {x|x € Aorx € B} 
=AUB. (2.12) 


The Venn diagram in Fig. 2.3 illustrates our considerations. Similarly, it is clear that 
for any set A and B 


A=(A\B) U (ANB) (2.13) 


applies. 
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Fig. 2.3 The union of the disjoint sets A\B (left, blue) and B results in A U B (right) 


ICs 


Fig. 2.4 An infinite union of ascending subsets in the Venn diagram 


At times we have to deal with infinite operations of unions and intersections. Let 


us assume that an infinite sequence of sets A1, A2,... exists.” The infinite union 
09 
U A, (2.14) 
n=] 


is the set that contains all elements from each set A,. Figure 2.4 illustrates such an 
infinite union. Likewise, the intersection 


N An (2.15) 


is the set which contains only those elements existing in all sets Ay. 
As an example 


N= Um (2.16) 


n=0 


applies for the set of natural numbers because the infinite union includes all natural 
numbers. Likewise we have 


CO 


Ø = \In, ©) (2.17) 


n=0 


since an element that should be in all half-open intervals [n, oo) must be greater 
than any natural number n—and such a number does not exist.® 


5Such a sequence is often written as (An )n=1,...- 


6The object 00 is not a number, because you cannot use it in calculations. For example, 1+00 = 00, 
from which 1 = 0 would follow if oo were a natural number. 
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Another example of real numbers is a good illustration of the concept. As An we 


choose the interval A, := E 1- 4] with n > 1. If n increases the interval also 


n’ n 
increases which is why A, is a subset of Ay+1. (An)n=1,... is indeed a sequence of 


subsets. The limit of this sequence is then the interval 


1 1 
U E 1- J = (0, 1), (2.18) 
n n 


n=1 


because each number in the open interval (0, 1) lies (for sufficiently large n) in a set 
An. Further, the boundary values 0 and 1 lie neither in the open interval (0, 1) nor 
in one of the sets A„. We hope that both examples help to understand what is meant 
by an infinite sequence of sets. 


Power Set The set of all subsets of a set Q is called the power set which is denoted 
by P(Q). One has to realize that the power set is much larger than the set itself. 

Think about a situation with six elements Q = {1, 2, 3, 4,5, 6}. If we look at all 
subsets of this finite event space, we arrive at a total of 26 = 64 subsets of the event 
space, namely” 


PR) = 19, {1}, {2}, ..., {6}, {1, 2}, {1, 3}, (1, 4}, ..., 
{1,2,3}, {1, 2, 4},..., {1, 2,3, 4,5, 6}. 


Also for infinite sets, the power set is much larger than the initial set. This is a bit 
surprising, because it is not clear, why one can distinguish different “levels” (more 
precisely cardinalities) of infinity. We have put these considerations in Sect. 7.1.8 


2.2 Events and Sets 


In colloquial language it is said that “events occur.” But what is an event? Specific 
examples are a dice roll, the share price at the end of a trading day, or the move of 
a chess player. In economic contexts, an event often determines an economic result 
(such as a pay-out, a pay-in, a profit). 

However, an economist is usually not satisfied with the statement that this or that 
event could occur. Rather, economists calculate the expected values or variances of 
payments triggered by those events. In order to do so, mathematicians operate with 
the term “set.” To understand this, let us take a closer look at the example of a dice. 


7There is a total of (°) subsets that contain exactly n elements. If we add these binomial coefficients 
over all n, we get the result because a (°) = 64. 
8See page 103. 
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Dice Roll We can trust that everyone knows which characteristics an ideal dice has. 
If we take a closer look at several possible events related to a dice roll and identify 
them with certain symbols: 


Aı The one was rolled. 

Aa An odd score was rolled. 

A3 The dice can no longer be found. 

A4 The score cannot be read. 

As A dice has been rolled without the score being noted. 


We neither care whether the dice is thrown with the left hand or with the right hand 
nor whether it is pushed of the table. What matters is the score on the top. Therefore, 
one could describe the event of the dice roll alone by the score that appears at the 
end. This has the inestimable advantage that all conceivable events are completely 
described by six numbers. The mathematician ignores everything else. 


Thus, the events A3 and A4 no longer differ for the mathematician: A3 = Aa. 
Since obviously no scores are given, the mathematician even writes A3 = A4 = Ø. 

For the events A2 and As, the actual score is not reported. However, there is 
something that distinguishes the two events from the events A3 and A4: while A3 
and A4 hide the scores, A2 and As do not. Here a number was definitely determined, 
but we were not told which it was. Mathematically, this is expressed for the last event 
by noting all possible scores, i.e., As = {1, 2, 3, 4, 5, 6}. 

Let us look at event A 1. In this event we are sure that the score was one. But then 
a mathematician uses the score to identify the event, i.e., Ay = {1} In the same 
way, you can describe the event A2 by enumerating all odd numbers. This means 
Az = {1, 3,5}. 

We realize that A; represents a subset of A2, that is Ay C A2. There is a very 
clear interpretation for this set-theoretical representation: whenever the event A, 
occurs, the event A? is also true. And indeed, it is also true that the number of points 
is odd when a player rolls the number one. Of course, the opposite does not hold. 


Elementary Event To prepare for the following chapter on measures, it is useful 
to introduce the terms “elementary event” and “event space.”!? All these sets are 
specific sets. 

Elementary events are those sets that have no “genuine subsets.” What do we 
mean by this term? Since the empty set and the set itself constitute subsets, those 
two must not be considered. All remaining sets are the genuine ones. Elementary 
events therefore contain only a single element and are the smallest events which are 
conceivable. 


°You have to distinguish this notation from A; = 1. In this case A| would be a real number. In the 
case of A; = {1}, A, is a set containing only one element (the natural number 1). 

10 Anyone who is studying literature on general theory of measurement will not find these terms 
there. Corresponding “objects” are called differently, because one develops a theory which is not 
only concerned with probability measures. 
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The event space contains all events that one wants to look at. In the following we 
describe the terms descriptively with the help of examples. 

To get an idea of an elementary event, imagine rolling the dice once and ask 
yourself what scores can occur. These are 1, 2, 3, 4, 5, and 6. Since these results 
cannot be broken down further, we call a set an elementary event if it contains one 
of these numbers. 

While the term is easy to understand in context of a dice roll, it is not as simple 
if we consider realizations of a share price: here, one must know which listings are 
admitted on a stock exchange and if, for example, only full Dollar quotes or quotes 
in jumps of 10 cents are permitted. The identification of elementary events becomes 
even more complicated when one thinks of the results of a parliamentary election. 


Event Space We use this term to denote the set of all elementary events, commonly 
denoted by Q. It is either a finite or an infinite set.!! 


Event One does not always only want to discuss elementary events. Rather, one 
often wants to describe the effects that follow from a combination of several 
elementary events: “When rolling an odd number ...” or “At a day temperature 
above freezing ....” In this case we speak of composite events. Sometimes we 
simply use the term event. Such an event usually represents as set of elementary 
events. An event is thus a (arbitrary) subset of the event space, or A C Q. A then 
stands for a (possibly compound) event. 


Example 2.1 (Multiple Dice Rolls) Set theory also allows us to characterize some- 
what more complex events. Think of games in which the dice are rolled not 
once but several times in a row.!* This is also easy to handle mathematically. 
If you have three rolls, you only have to note the three numbers in a row. Now 
there are two possibilities when rolling the dice several times: either the order 
in which the scores appear is important or it is not. If the order is relevant, the 
event would be described by a triple, i.e., (2, 1, 2) with three rolls. If the order is 
meaningless, the mathematician would note that the obtained numbers belong to the 
set {1, 2}. 


Example 2.2 (Coin Toss, Once and Several Times) We will later look at a situation 
where the result will depend on the toss of a coin. Two outcomes are relevant: heads 
or tails. The chance of a coin standing on the edge is usually excluded as being 


1n probability theory Q is often referred to as the basic set. The elements of this set are labeled 
ow. 


12 German children like to play “Mensch ärgere dich nicht!” (A literal translation is “do not be 
annoyed.” In UK a similar game is called “Ludo.” We are not aware whether this game allows the 
same rule as described now). 

The following rule applies to this game: if a player has no meeple at all on the field (which 
concerns all players at the beginning of the game), he has three attempts in each round to roll the 
necessary number six in order to bring a meeple into play. 
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improbable.'* Then a coin toss can be described by an element of the set {u, d}, 
where u stands for heads and d for tails. Thus the coin toss is similar to the dice roll, 
but here we have only two instead of six elementary events. 

The situation becomes a little more complex when we look at multiple coin 
tosses. We will discuss details on page 24 in Example 2.4. 


Example 2.3 (A Share Price) Assume that the prices of a share correspond to any 
real nonnegative number. The event space Q of a share price at a future point in 
time thus corresponds to Q = R,.!4 This event space contains an infinite number 
of possible elementary events. Hence, the so-called power set is infinite, too.!° The 
power set contains all (open and closed) subintervals of real numbers as well as their 
unions and intersections. Such a set is extremely large. 

We will use this event space again when we discuss the Lebesgue measure and 
the Stieltjes measure. 


So far we have explained these terms using the simple examples of a dice roll, 
a coin toss, and a single share price. One could therefore think that the event space 
Q will always have to be constructed very simply. That is by no means the case. 
To illustrate this point, we present more complicated examples. To do so it is 
necessary, however, to clarify the difference between discrete-time and continuous- 
time models. 


Discrete-Time Models If you proceed in a discrete manner, you assume that the 
share price is quoted at t = 0,1,.... There are periods between these dates in 
which no trading takes place and, as a result, no price is determined. Whether the 
time periods between the dates are long (a year) or short (a minute or a second) is 
a technical question, but not fundamental. It is crucial that the trade is interrupted 
again and again. In such models it is often assumed that the price movements from 
a point in time to the next are also of a discrete nature with a price either rising or 
falling by a (fixed) percentage. In such a case, we are dealing with a discrete-time 
model of share price development. 


While t = 0 denotes the present, we characterize all future times with the natural 
numbers t = 1, 2,... up to the terminal date T. The terminal date can be infinite, 
T — œ. In this case there is “no end of the world.” 


13 That this case can indeed occur was shown, for example, on March 24, 1965, when FC Cologne 
and FC Liverpool competed against each other in the quarter-finals of the European Football Cup. 
The three matches played between the two clubs all ended in a draw. According to the rule in 
existence at that time the winner had to be determined by a coin toss. When the first coin was 
flipped, it stopped on its edge. 

'4 Our following considerations may be applied to the case in which the event space covers only an 
interval of R+. 


!5See page 107 for details. 
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Example 2.4 (Binomial Model) To get a vivid idea of the discrete-time concept, we 
now consider a simple binomial model with a finite number T future points in time, 
T>1. 

Let us assume that the price of a share today is Sg. A decision-maker may use 
the idea that this price will change at any future time either by the factor u(t) (for 
up) or by the factor d(t) (for down) with u(t) > d(t) > 0. The symbol œw € 
{u(t), d(t)} in this simple model means nothing else than a process that causes the 
previous stock price S;_; to change by the factor u(t) or the factor d(t), so that 
either S; = S;-ı - u(t) or S; = S;—1 -d(t) applies. If one has such a @; in mind, one 
could speak of an “elementary event of a time.” However, there is no such term in the 
literature. If one examines all consecutive processes œr fort = 1,2,..., T, then one 
is dealing with a vector, and exactly such a vector is meant when the literature talks 
about discrete-time models of elementary events. An elementary event is therefore 
a vector!® 


w = (@1,@2,...,@7) € Q. (2.19) 


The event space is then the set Q7. Unlike the one-period model elementary events 
are no longer elements of Q but vectors of elements of 2. Thus, events are sets of 
vectors. If one specifies this for the binomial model with T = 2 future points in 
time, four possible elementary events can be distinguished, 


u(1), u(2) 


gaJ O (2.20) 
d(1), u(2) 


d(1), d(2) 


In some economic examples, this model is used with an infinite time horizon. For 
the sake of simplicity, however, it is assumed that the factors u and d are constant 
over time. Thus, events are determined by a sequence of u’s and d’s. Any elementary 
event can be written as an infinite tuple 


w = (wı,...) € {u,d}”. (2.21) 


Such an event space is often illustrated by a so-called binomial model. A 
graphical representation is used in which the entry in the event vector (i.e., a 
@ € {u,d}) is expressed by an upward or downward movement. Figure 2.5 
represents such a model for the first three points in time. The particular path uud, 
i.e., an elementary event is highlighted. All paths are cut off at t = 4 and the 
movements continuing into infinity are only indicated. This corresponds to a coin 
toss with infinite repetition. 


!6This representation is only correct if the individual points in times all have the same Q. 
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uu 


u uud 


Fig. 2.5 Binomial model with events up to t = 3, the event uud is highlighted 


Our example provides further insights. Look at Fig.2.5 and concentrate on the 
elementary event uud. Note that in addition to this path there exist two further 
elementary events (udu and duu). While their states in = 3 are identical, the 
three elementary events are not. One also speaks of a recombining binomial 
model. 

Such models are often used in the theory of evaluating derivatives. However, 
even if the paths uud, udu, and duu for the underlying asset (typically a share) 
result in the same payment at the time t = 3, this is not necessarily the case when 
valuing an option on this asset. There exist also derivatives where this value of 
the option at time t£ = 3 depends on the path that the underlying asset has taken, 
while it is hard to distinguish between the elementary events uud, udu, and duu in 
Fig. 2.5. 


Continuous-Time Models What changes when looking at a continuous-time 
model? A continuous model is based on the assumption that equity trading is never 
interrupted between two points in time. Rather, the market is trading on an ongoing 
basis. This implies that a stock price is given at any instant. The price is moving 
permanently. Furthermore, assuming that the share price can attain any value within 
an interval (however defined) the model is continuous in time and state. 

Whether one prefers discrete or continuous models has absolutely nothing to do 
with the nature of reality. Rather, it is a question of usefulness. 

Possible developments of a share price within a time interval [0, T] can no longer 
be described with the help of tuples or vectors (@1,@2,...,@7): the number of 
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entries would have to be infinite regardless of the size of the time interval. Although 
this “vector” does have integer column entries at any point in time, we need a 
column for each real number. Such an object is mathematically no longer a vector 
but a function. 

Just as in the discrete model, an elementary event should describe the possible 
development of a share price over the entire time interval. If this event can be 
represented as a real number, then we must characterize it as a function, 


w:[0,T] > 2. (2.22) 


An elementary event could be either an increasing or a decreasing development of 
the share price within [0, T]. In the first case w would be a growing function, and in 
the second a decreasing one. Most elementary events will not have the property of 
monotonicity. Instead, one will usually observe irregular ups and downs. From now 
on events constitute sets of continuous functions. 


Example 2.5 (Share Price Evolution) In Example 2.4, we have studied the share 
prices at several future dates. In this example all time indices from today (t = 0) 
to the final date (t — oo) are available. Future share prices will then no longer be 
numbers but functions of time. 

Then the event space Q will get more elaborate. After all a share price evolution 
is a function of real numbers. We reasonably assume that this function is continuous 
(i.e., shows no jumps). Q must then contain all continuous functions f(t) 
[0, 00) —> R.!” This set is also referred to in the literature as C[0, 00). The letter C 
indicates “continuous.” 

Anyone who wants to study the Brownian motion carefully must know that there 
are continuous functions and differentiable functions, but they are not identical. First 
of all, one can prove with little effort that a differentiable function must also be 
continuous. But the inverse does not have to be true: continuous functions are not 
necessarily differentiable. Using an example of Weierstraß we show on page 107 
how such functions can be constructed. The role of these functions in a Brownian 
motion will be discussed later. 

It is not a problem to imagine single elementary events of C[0, oo). In Fig. 2.6 
we have shown three conceivable share price developments. One of the shown share 
prices always grows at the same rate, another one fluctuates almost like a sinus 
function, and, finally, there is a share price development that could perhaps actually 
be observed on a stock exchange. Each of these functions is an elementary event 
from the set C[0, ©). 

There is no doubt that sinusoidal or linear share price trends are highly unlikely. It 
is not at all clear how to define and measure probabilities of share price evolutions at 
this point. Although unlikely, both the linear and the sinusoidal movement cannot be 


ITWe could exclude negative stock prices since shareholders are not liable, so f(t) : [0,00) > 
R+. 
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Fig. 2.6 Three elementary events in the event space C[0, 00) 


excluded. By contrast, no events in the sense defined here would be curves that are 
not continuous and show jumps. Equally unthinkable are share price developments 
which do not move forward in time but show a “time reversal,” i.e., move back into 
the past.!® 

We would like to emphasize that all considerations are deterministic. Although 
uncertainty exists, we do not have probabilities yet. All examples of share price 
evolutions can occur. The three events mentioned (including the “random” function) 
assume that the future values will be described by the function f(t). Probabilistic 
considerations will be introduced later. 

The Brownian motion uses the event space 2 = C[0, oo). Usually it is assumed 
that all elements of the event space start in one and the same point; for all functions 
W(t) € C[0, co) then W(0) = a applies. In the figure we have chosen a = 0, this 
specification will later also apply to the Brownian motions. 

Figure 2.6 shows three functions being continuous and starting in the same point. 
These two conditions are typical for every path of the Brownian motion. However, a 
third characteristic of paths in Brownian motion is not recognizable in Fig. 2.6 and 
will be discussed later. 


'8Such a thing is incompatible with the concept of a function. 
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updates 


Continuous-time theory makes use of a sophisticated functional analytical appara- 
tus. If you really want to understand what a Brownian motion is and how to use it, 
you have no choice but to first deal with measurement theory and general integration 
theory. 


3.1 Basic Problem of Measurement Theory 


In everyday life it is often said that something is measured. Therefore, every reader 
probably has a certain idea of what a measure is. If you are not a mathematician, 
you might even ask yourself why you need a theory for such a “simple object” 
as a measure at all. Characteristically, a measure is a number that describes a 
property of an object, such as its volume, weight, or length. Probabilities are also 
numbers which measure something: probabilities provide information about the 
intensity with which someone expects a possible future development. They play 
a decisive role in the theory of stochastic processes. And hardly anyone will deny 
that probabilities are not quite as easy to comprehend as the distance between two 
points on a plane. 

We hope that our readers can follow us better when we state that it is necessary to 
engage in measurement theory. This theory attempts to discuss in a general way the 
properties of numbers which are intended to capture characteristics of the diverse 
objects of interest. 


Properties of Measures An elementary introduction to measurement theory could 
simply be imagined in such a way that each subset of the event space is assigned a 
number, namely its measure. A measure u would then be a mapping of each subset 
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of Q into the real numbers or formally! 
u: PQ) > R. (3.1) 


If we think of the dice again, a number has to be assigned to each of the 
64 subsets. If we think of a probability measure, we would assign the relative 
frequency é to each elementary event of an ideal dice. A subset with n elements” 
has probability ¢. Unfortunately, the conditions are much more complicated when 
dealing with event spaces that contain an infinite number of elements. Under these 
circumstances, the number of conceivable share prices within an arbitrarily large 
closed interval is infinite. This forces us to pursue a different approach. 

It is obvious to demand that a measure has reasonable properties. You have to 
be careful. It can easily happen that with the formulation of desirable properties 
one gets entangled in logical contradictions without even realizing. In the following 
we will show that this is indeed the case. We will subsequently reflect on the 
conclusions to be drawn. 

To understand how readily one can get caught in contradictions, let us look at 
a specific example: we concentrate on the event space Q = R which includes 
the real numbers, and try to construct a probability measure u on Q. We will 
present a number of properties that should be thought of being useful or at least 
unproblematic. 


Existence: The first property that we want to propose seems perfectly natural. We 
require that a measure (A) can be assigned to each set A C Q. Some readers 
may wonder why such a trivial feature has to be mentioned at all. At the end of 
this section we will see that exactly this property will turn out to be problematic. 

Nonnegativity: In the introductory remarks we had suggested that a measure 
could be understood as something like a volume, a length, or a probability. 
Against this background it seems obvious to postulate that a measure is non- 
negative,” 


VACQ uA). (3.2) 


This is immediately plausible for probabilities. If one limits oneself to classical 
physics, masses and lengths will also be nonnegative. The area of the plane also 
has no negative contents.* 


'We have described the set of all subsets of Q as power set P(Q) with the details being discussed 
on the pages 20 ff. 


>This is an event with n different results from rolling a dice only once. 
3The symbol VA means “for all A applies...” 


*However, it is conceivable that in more advanced considerations these parameters could also 
become negative. In this case, the measurement theory must be expanded. One speaks then of 
the so-called signed measures, a topic we will not pursue further. 
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Additivity: Furthermore, we require that in the case of two disjoint subsets which 
are combined, the corresponding measures must be added, 


VA,BCQ ANB=B => u(A)+u(B) = u(AU B). (3.3) 


The measure must be additive. This requirement will come as no surprise to 
anyone who thinks in terms of area, space, or volume. It should also apply when 
you are dealing with probabilities. In this case the prerequisite of Eq. (3.3) means 
that the events A and B are mutually exclusive. 


Before we turn to further properties of measures, we will deal with a statement 
about measures that can be derived directly from (3.3). 

From this condition it applies, for example, that a subset cannot have a larger 
measure than its supersets. If A C B applies, it follows that 


VACBCQ B=B\AUA = u(B)=u(B\A)+u(A) > n(A). (3.4) 


A First Exercise (Additivity) In order to gain experience with measures we want 
to prove two characteristics. We will not need the following theorem for our 
further considerations. However, the proof of the theorem is suitable for a better 
understanding of the interplay of the various properties of measures.” We propose 
the following: 


Proposition 3.1 Jf A and B are arbitrary two subsets of Q, the following two 
properties are equivalent: 


1. The measure is additive, see Eq. (3.3). 
2. For the measure applies 


L(A) + u(B) = u(A N B)+ u(A U B) (3.3) 
(for arbitrary sets!) and u(®) = 0. 


The merit of Eq. (3.5) can be realized by considering Fig. 3.1. This figure shows 
three separate areas. You see the set A\(AN B) on the left, (AN B) in the middle, 
and B\(A N B) on the right. Note that the intersection (AN B) belongs to both A 
and B. 

Let us look at Eq. (3.5). With the sum u(A) + u(B) we capture the measure of 
A, i.e., the left as well as the middle set, 


A= A\(AN B)U(ANB) (3.6) 


5If you want to skip this exercise, continue reading on page 32. 
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Fig. 3.1 Intuition of property (3.5) of a measure 


and the measure of B, i.e., the middle and the right set, 
B = B\(ANB)U(ANB). (3.7) 
Obviously, the middle set (AN B) here is “counted” twice. 
Let us concentrate on the right side in Eq. (3.5). Counting is different here. In the 
sum (AN B)+ (AUB) we capture the measure of AU B and thus the measure of 
the left, middle, and right set. Subsequently, the measure of AN B, i.e., the measure 


of the middle set, is added. But this is exactly the same area we calculated before. 
We come to the formal proof. 


Proof Part 2=> | is trivial, see Eq. (3.3). The opposite is a little more complicated. 
Since (3.3) must apply to any set A, B we use A = B = Ø, get u (Ø) = 0 and thus a 
part of the result. We prove the second part by referring to the exercise of the chapter 
on set theory. Accordingly it follows from (2.10) that for any sets A and B (even if 
they are not disjoint) 
AUB=(A\B) U B (3.8) 
must be fulfilled. If we apply Eq. (3.3) we get 
(AU B) = u(A\B) + u(B). (3.9) 
We also realize that for any set A and B 
A= (A\B) U (ANB) (3.10) 
and again the two sets on the right side of this equation are disjoint. Hence 
L(A) = u(A\B) + u(ANB) (3.11) 


also applies. From Egqs.(3.9) and (3.11) follows the claim, if w(A\B) is 
eliminated. a 


6See page 18. 
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o-Additivity So far, we have restricted ourselves to the union of two, three, and ina 
few cases to four sets and formed their intersections and determined the associated 
measures. However, the number of sets involved has always been finite. It should 
have become clear how to proceed if the number of sets continues to increase, but 
still remains finite. Sometimes, however, it is necessary to deal with the union of 
an infinite series of sets and to determine their measure. It is by no means obvious 
how to proceed under these circumstances. A relevant property of measures in this 
context is called o-additivity. That is what we are going to discuss now. 

Consider an infinite sequence of sets A1, A2,... This is supposed to be a 
sequence of subsets, i.e., 


AıC ACA3C.... (3.12) 


Obviously, the sets grow with an increasing index. We form the infinite union or the 
set containing all elements of the A, and call it | J}_ | An. Figure 2.4 on page 19 
illustrates this situation. 


Each of these sets A, has the measure u(An). What can one meaningfully 
09 


say about the measure of |) An? To answer this question, we consider any finite 
n=1 


m oo 
number n < m and break the union at m, U An. This set differs from U An by 
n=1 n=1 
those elements which are only contained in the “later” sets Am+1, Am+2,.... With 
increasing m this “residual set” gets smaller and smaller. All we are asking is that 
the measure of this residual set disappears entirely when m > ov. 


Thus, we require that the measures u(A„) converge to the measure of the set of 


oo 
infinite union u ( U An) 5 
n=1 


00 
Ai C A2C ARC... > lim man = «(U a): (3.13) 
n—00 nal 


And that is exactly what the o-additivity is supposed to mean. 
Return to our interval example from page 20. We know that sets B 1— 1 


“cling” as close as possible to the open interval (0, 1) when n —> oo. Between 
these closed intervals and the limit (0, 1) there is “nothing.” There is no number in 
(0, 1) that cannot be found in any one of the A,,. Now look at the measures u (An). 
If the limit of these sets would not go to u((0, 1)), then quite obviously a part of 
the measure either “disappeared” or “arose from nowhere.” Property (3.13) prevents 
exactly that. Our measure is o-additive. 
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= Ure An 


Fig. 3.2 Pairwise disjoint sets as in Proposition 3.2 


You can easily come up with a “measure” which violates the condition (3.13). To 
this end, we define the following measure u on the set of real numbers,’ 


1 A=R, 
HA) = (3.14) 
0 else. 


With this measure, the full probability is assigned only to the set of all real numbers 
with other sets being impossible. Now look at the sets A, = (—00, n], which contain 
all real numbers up to n. These sets form an ascending sequence. The following 
applies 


00 
lim w(An)=041=p(() An). (3.15) 
n—-> oo 


n=1 


o -additivity does not hold. 


Another Exercise (o-Additivity) Let us concentrate on o-additivity a bit further.® 
We just looked at a series of sets, each being a subset of its predecessor. Now we 
turn our attention to the case of an infinite number of sets that are pairwise disjoint.? 
Then the following applies: 


Proposition 3.2 Let A, be a sequence of pairwise disjoint sets. Furthermore, the 
measure is additive and o -additive. Then the following applies: 


ny U Ar) = uA). (3.16) 
=1 


n=1 


The prerequisite of Proposition 3.2 states that the sets of a sequence never 
overlap. To obtain a descriptive idea of what is asserted here look at Fig. 3.2. The 


7This is even a probability measure. 

8 Anyone wanting to skip the exercise may continue reading from page 35 following the material 
after the keyword “probability measure of the event space.” 

°“Pairwise disjoint” means that every two sets (every pair, so to say) are disjoint, i.e., do not have 
a common element. 
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Proposition 3.2 states that the measure of the total set JPS; An is as large as the 
(infinite) sum of the individual measures u(A„). 


Proof The proof’s challenge is that the ø -additivity deals with ascending sets, while 
the sets under consideration are pairwise disjoint. We show how to cope with the 
pairwise disjoint sets in such a way that you end up with increasing sets. You can 
easily find such an ascending sequence by combining the first m sets Am into a new 
set. 

We start with a finite number of sets and define 


n 
Bn := |] Am- (3.17) 
m=1 
Since Bı C Bz C..., the sets B, represent an ascending sequence. Thus, according 
to (3.13) 

oo 
u (Ù s) = lim (Bn). (3.18) 

n= 


Remember that the union of all B, is the same as the union of all A„, and therefore 
we have!" 


00 
u (Ù a) = lim (Bn). (3.19) 
Looking at (3.3) on page 31, the right side of the last equation can be written as 
© n 
m (U a) = lim 2 (Am). (3.20) 


That was to be shown. E 


Probability measure of the event space: In the context of probabilities it is 
reasonable to assume that the decision-maker has a complete picture of all 
conceivable events. Therefore, probability of any event occurring is obviously 
one. In formal notation 


uQ) =1. (3.21) 


10One reason for this is that A U A = A always applies. 
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Shift invariance: One or two more properties will be added to those noted before. 

We request that the measurement of a set remains unchanged if it is shifted by 
one unit. 
It is rather difficult to get a clear idea of this property when you think of 
a probability measure. With area measures, however, the demand for shift 
invariance is immediately obvious. A circle with a certain diameter finally has 
the same are everywhere on the plane; and a cylinder with a certain diameter and 
height has the same volume everywhere no matter where it is located it in space. 
By analogy, we require that the measure of an interval [0, 1) equals the measure 
of the shifted interval [x, x + 1) no matter how large x is. We note 


VACQ,xeR y(A)=ulAtn). (3.22) 


The reader will probably understand that area measures should be shift-invariant. 
But why this should also apply to probability measures is not obvious. We will 
address this point later. 


Contradiction Following from Our Properties After having presented the six 
properties of probability measures we get to the core of the matter. We intend to 
show the reader that a measure with the six characteristics described leads to a 
serious problem. 

To this end consider the half-open interval A = [0, 1), which must have a 
measure using the first property. This measure may be denoted by x := u([0, 1)). 
Now we use the properties (3.2), (3.3), (3.13), and (3.22) to determine the measure 
of the entire real axis. We break down the real axis R = Q into infinite many half- 
open intervals 


Q= U [n, n+ 1). (3.23) 


n=— 00 


Note that these intervals are pairwise disjoint. Then it follows that 


uUD=uR)= u (Un. n+ o) due to (3.21) and definition 
neZ 
=) ulin, n+) see (3.16) 
neZ 
= > #([0, 1)) due to shift invariance (3.22) 
neZ 
= > x due to definition of measure 
neZ 
f x= 
ME (3.24) 
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The following observation is decisive: regardless of the specific value x, the 
probability of the entire event space cannot be one: either the probability is infinite 
or zero. Hence, (3.24) shows the contradiction with property (3.21). 


Conclusion (Measurable Sets) What conclusion must be drawn from this state- 
ment? Obviously, at least one of the properties mentioned above must be eliminated. 
Which of the six properties is a suitable candidate? 

Let us start with shift invariance, because we have noted that there exist no 
obvious intuition for this property. Although removing shift invariance seems to 
be a good idea, it is not sufficient. It can be shown that a contradiction can be 
constructed even if one limits oneself to the properties of nonnegativity, additivity, 
and o-additivity. The proof of the contradiction is then, however, no longer as simple 
as above and requires a set of advanced mathematical instruments. 

Thus, we have no choice other than to realize that the idea of assigning a measure 
to any subset cannot be maintained. The very first property of a measure that we 
developed on page 30 must be dropped. While in the finite dimensional case every 
elementary event will indeed have a probability, in the infinite dimensional case 
we must proceed with more caution. Our measurement function u may not assign 
a number to any subset. Instead we must start by determining those subsets that 
should be measurable at all. 

To this end the notion of a o-algebra is introduced. There are two ways to 
approach this concept. One alternative is to restrict ourselves only to the properties 
which have to be met by measurable sets. These properties are quickly explained, 
so that we can understand the formal definition of a o-algebra directly.!* Another 
alternative is to provide a content-related interpretation of measurable sets which is 
often used when economists work with a o-algebra.!* 


3.2 o-Algebras and Their Formal Definition 


Mathematical Basics Remember that it is not permissible to treat any subset as 
being measurable. Therefore, it is necessary to determine what can be measured 
and what cannot be measured. In most cases this choice is arbitrary. 

If we want to use ideas of a measure developed on pages 30, we have to place 
certain minimum requirements on measurable sets. Otherwise the concept of a 


'lThis is the proposition of Banach and Tarski from 1924. It should be noted that both scholars 
could even dispense with the o-additivity of the measure for their evidence, referring only to the 
properties of nonnegativity and additivity. However, the proof of their theorem is only possible in at 
least three-dimensional space and using an axiom that is otherwise not necessary in measurement 
theory (axiom of choice). Under attenuated conditions, similar paradoxes can also be constructed 
in the plane and on the straight line. 

12We will do that in the next section. 


'3See page 40 ff. 
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measurable set will lose its meaning. These minimum requirements result from 
mathematical considerations. 

Formally, a o -algebra contains all measurable sets. At a minimum, any o -algebra 
must have the following properties: 


1. It is only natural that each measure assigns the number zero to the empty set. But 
this presupposes that the empty set is measurable. Therefore, any o-algebra must 
contain the empty set. 

2. Correspondingly, any measure will naturally assign the number | to the entire 
state space. Again, this presupposes that the set Q is measurable and must be 
contained in every o-algebra. 

3. We had made it clear that no measure is lost when uniting disjoint sets A, B 


H(A) + u(B) = u (A U B), (3.25) 


see page 31. If the disjoint sets A and B are measurable, then consequently their 
intersection and union must also belong to the o-algebra. 

4. Consider a set A C Q. This set A and its complement Q\A are disjoint. The 
measure of the state space is 


AR) = u(A U Q\A) = u (A) + u(Q\A). (3.26) 


Equation (3.26) implies that the complement should be included in the o-algebra. 

5. We had several examples above in which infinite unions and intersections were 
involved. We claim that for sets A, also the infinite union Ui An and the 
infinite intersection NA: An are measurable. 


The five properties listed are based on simple mathematical considerations. Before 
we interpret these properties economically we want to state the formal definition of 
a o -algebra using the following two-step procedure. 


A Two-Step Procedure 


« The first step is to specify some sets that should be measurable. 

« The second step describes the operations that can be performed with measurable 
sets without destroying the property of measurability. These operations include 
complement, union, and intersection. 


Admittedly, this procedure is a bit cumbersome, because we have to check 
whether or not we are still dealing with a measurable set. However, it has the great 
advantage that one will not get entangled in logical contradictions. There is no other 
alternative. 
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Definition 3.1 (o-Algebra) By a o-algebra F we define a set of sets with the 
following properties!*: 


1. The empty set is a part of the algebra, Ø € F. 
2. With any set B its complement B® = Q\B is included, 2\BE F. 
3. Along with B1, Bo,... the union U; Bn is included. 


It is also said: sets are F -measurable if they are part of a o-algebra. In this context, 
we will also refer to the properties mentioned here as construction rules or simply 
rules. 

The following note may be helpful. Our definition applies to any starting sets 
(subsets) of 2. Those sets must be determined. Otherwise the properties 2 and 3 
would be meaningless. The definition will usually not result in a unique o-algebra. 
Often, different o-algebras will exist for a given set Q. 

The reader may wonder why our definition contains statements about the union of 
sets, but not about their intersection. Are intersections not supposed to be included in 
the o-algebra? The answer may come as a surprise. Intersections of sets are actually 
elements of the o-algebra. However, we do not need to include this statement 
explicitly in Definition 3.1 because it follows from our definition—this result will 
be derived in the next paragraph. Definitions should always be as parsimonious as 
possible. 


Measurability of Intersections To verify the statement that intersections of sets 
must be #-measurable when following Definition 3.1, we focus on the third 
construction rule. This rule states that the union of any number of subsets U,, Bn 
belongs to the o-algebra. Based on the second rule the complement |), (Q\ Bn) 
must be F -measurable. However, the following always applies to any set: 


UAB = AN) Bn. (3.27) 


which is illustrated by Fig. 3.3. Hence by using rule 2, 2\ On Bn must also belong 
to the o-algebra. It follows that not only the union but also the intersection N, Bn of 
subsets are measurable. 


Measurability of the Event Space You can observe that the event space Q is F- 
measurable. The second construction rule states that B U B® = Q, and according to 
the third rule, subset unions are measurable. 

There is a vivid interpretation of what measurability means. We will discuss this 
in the next section. 


'4G-algebras are often referred to as F. The symbol stands for the word “filtration.” We will 
consider filtrations in more detail in Sect. 5.5. 
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Fig. 3.3 To illustrate the identity of Q\(A N B) (left) and the union of Q\A and Q\B (both sets 
are colored blue in the images) 


3.3 Examples of Measurable Sets and Their Interpretation 
We will use three examples to illustrate our considerations. 


Example 3.1 (Coin Toss) A o-algebra for flipping a coin has a simple shape. First 
of all, we know that the o-algebra must contain both the empty set and the total set. 
Thus the two sets @ and Q = {u, d} always belong to any o-algebra, 


DEF, QEF. 


In the case of tossing a coin the o-algebra is either F = {Ø, Q} (and thus represents 
the smallest conceivable algebra) or it consists of all subsets of the event space 
F = P(Q).'> In the first case one speaks of a “trivial” o-algebra. If you realize that 
the coin toss is the simplest uncertain situation you can imagine,!® you might not be 
surprised by this result. 


The example allows a very straightforward and easy-to-understand interpretation. 
For this purpose we want to equate measurable events with events whose occurrence 
a decision-maker can “observe.” The trivial o-algebra would then be synonymous 
with the (almost worthless) information “a coin was tossed” without being told the 
result of the toss. 

In the second example, however, individual events {u} and {d} were also 
measurable. This can be understood to mean that it should be verifiable whether 
the coin toss resulted in heads or tails. 


Example 3.2 (Dice Roll) Basically there are six possible elementary events, i.e., 
the sets {1} to {6}. But let us consider the case that a person watching the dice roll 
is only told whether an even or an odd score was obtained. Nothing else shall be 
revealed. Since it is possible to check whether the dice was rolled at all, the total 
set Q = {1, 2,3, 4,5, 6} and the empty set Ø are undoubtedly among the observable 
events. If, moreover, it is stated whether the number of points obtained was even 
or odd, the sets {1, 3, 5} and {2, 4, 6} are also observable. This makes it possible to 


!5The P symbol denotes the power set, i.e., the set of all subsets. See page 20. 
16 After all, uncertainty can only be spoken of if there are at least two different events. 


3.3 Examples and Interpretation 41 
define the o-algebra in the form 
Fi = fø, {1, 3, 5}, {2, 4, 6}, (1,2,3,4,5,6}). 


It can easily be seen that this set indeed meets all the requirements for a o-algebra. 
Now we extend the example and assume that the exact score will be announced. 
Then for the o -algebra the following applies: 


Fa = P({1,..., 6}), 


where the o-algebra is denoted by F2. Apparently, the o-algebra consists of all 
subsets of the set {1,..., 6}. 


Example 3.3 (Double Dice Roll) Consider the case where a dice is rolled twice in 
a row and the order of the scores is important. Then an elementary event can be 
described by a pair such as (1, 6). It should be possible to measure the event in 
which it is only known that the score of the second roll is exactly one point higher 
than the score of the first roll. Which exact scores (on the first and second roll) were 
achieved, however, remains hidden. Obviously, the set 


t1,2), 2,3), 8,4), (4,5), 6, 6)} 


can then be measured. The complement of this set (which contains 36 — 5 = 31 
elements) is also measurable. The same applies to the empty set and 2. Other sets 
are not measurable. 


Let us summarize our considerations. Measurable sets are mathematically 
characterized by the fact that certain operations (union, complement building) are 
permissible. The admissibility of these operations leads to a set of measurable sets 
which we call o-algebra. Every element of this algebra is called an event. Events 
contain elementary events which cannot be broken down further. An event A (a 
measurable set or an element from the o-algebra) can be described as follows: 


Interpretation: an event A can be measured if it is possible to observe whether 
or not A has occurred. 


We can show that the above interpretation does not only contradict the mathe- 
matical definition but rather supports it: 


1. Common sense, on which one certainly cannot always rely, tells us that for 
any event the negation of this event (“the opposite”) should also be known. If 
someone can prove in court that event A has happened, he can also disprove that 
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event A did not happen. Exactly this shows up in the mathematics of a o-algebra: 
if any set A € F is selected, the complement Q\ A is included in F. The second 
rule of construction in the definition of the o-algebra thus confirms common 
sense. 

2. If events are logically linked we expect that observability is maintained. If you 
can prove whether or not the events A and B have taken place you will be able to 
tell whether or not the compound events “A and B” or “A or B” have occurred. 
This is ensured by the third construction rule in the definition of the o-algebra. 
In our examples, the corresponding operations are transparent because the two 
logical links always yield only trivial results such as the sets themselves, the 
empty set or &2. We note, however, that the union and intersection of two sets are 
always part of the algebra. 


In economic contexts instead of a o-algebra one prefers to talk about an 
information system. However, not all algebras can be interpreted as (meaningful 
or plausible) information systems; but conversely, every information system must 
be represented by a o-algebra. 

In summary, we can state the following: if we want to denote by o-algebra the 
set of events known to and verifiable by a person, then each such algebra must meet 
several conditions, 


There is an event: The total set Q is part of the o-algebra. 

Negations are known: With every known event A € F the complement Q\A is 
also located in the o-algebra.!7 

Or/and links are known: With the events A and B being part of the o-algebra, 
then the union A U B and the intersection AN B are also elements of the algebra. 


If one imbeds also infinite unions into the set of conditions, the formal definition of 
a o-algebra results. !* 

Some readers may think that there is no need to say more. That would be a 
mistake. In real life there exist situations where it is not sufficient that a person is 
informed about the existence of an event. In the case of a lawsuit, i.e., this person 
must also be able to convince other parties of the occurrence of the event. It must be 
possible to provide irrefutable evidence. The event must therefore be verifiable by a 
third party. 

Finally, we would like to point out that information systems can also be related 
to one another. This can be explained by an example. With the dice roll on page 41 
we had stated that at first one could only observe whether the roll resulted in an 
even or odd score. However, in the second o-algebra it was also possible to verify 
the precise score. If the o-algebra can be understood as an information system, it 
should be clear that the second system is more informative than the first one. After 


'7Tf it is known that an even number was rolled, i.e., {2, 4, 6} € F, it is also known that an odd 
number was not rolled, i.e., {1, 3, 5} = Q\{2, 4, 6} E F. 
'8See page 39. 
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all, one learns something about the precise score and not only whether the score can 
be divided by two without any remainder. This relation of the two sets of information 
can be represented mathematically simply by 


FC PR. (3.28) 


Each event observable in the information system 71 can also be observed in the 
information system F5. It is also said that Fz is “finer” than 7. The opposite, of 
course, does not apply. In this way, o-algebras naturally reflect characteristics of 
information systems that otherwise can only be described with significant formal 
efforts. 


3.4 Further Examples: Infinite Number of States and Times 


Key Date Principle Finance theorists often analyze models in which the present 
(t = 0) and the future (t > 0) are considered. If situations with several future 
times (t = 1,2,..., T) are examined, there are two possible approaches. You can 
either work with discrete-time or continuous-time models.!? Regardless of which 
approach is used a basic principle common to both must be pointed out: 


All considerations made in the context of multi-period models take place in 
the present (t = 0). 


While being in t = O we think about what we now know about the future 
(t = 1,2,...). However, as we move in time our knowledge about the future may 
improve, but this aspect is of absolutely no relevance now (i.e., in t = 0). 


Several Points in Time In this section we will deal with more complex o-algebras. 
They comprise either several times or an infinite number of elementary events. 


Example 3.4 (Binomial Model) We refer again to the example of the binomial 
model (see Fig.3.4 on page 24). The model consists of exactly three points in 
time. The individual paths are described by sequences of u and d. There are a total 
of eight paths, each representing an elementary event. As can be seen at t = 3 only 
four different results are possible: the “state” uud at t = 3 can result from three 
entirely different paths: uud, udu, and duu. 


19For the difference between both approaches we refer the reader to pages 23 ff. 
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Fig. 3.4 Binomial model uuu 
with T = 3 

uu 

u uud 
ud 

d udd 
dd 

ddd 


We now turn our attention to a o-algebra, which may consist of the measurable 
sets described below, 


Py = (tuuu, uud), {udu, udd}, {duu, dud}, {ddu, ddd}, m (3.29) 


The ... sign indicates all those sets that can be constructed by forming unions and 
intersections from the four measurable sets {uuu, uud}, {udu, udd}, {duu, dud}, 
and {ddu, ddd}. This means, for example, that the set {uuuu, uud, uud, udd} and 
Q\{uuuu, uud} are also contained in the o-algebra. Subsets of the above four events 
are not included in the o-algebra. Therefore the event {uuu} is not measurable. The 
same applies to {uud} and {udu}. 

It is also said that the o-algebra considered here is “generated” by the four 
elements {uuu, uud}, {udu, udd}, {duu, dud}, and {ddu, ddd} mentioned above. 

This o-algebra can also be thought of as an information system. The only thing 
required is to understand what makes this algebra a measurable set. Let us look, for 
example, at the two measurable sets 


{uuu,uud} and {udu, udd}. 
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What do these two sets have in common and what makes them different? They each 
consist of two elementary events, and we can assign a probability to each of the two 
sets. However, the following considerations are crucial: 


1. Individual elementary events such as {uud}”" are not observable. The smallest 
events that can be observed contain at least two elementary events. 

2. If an elementary event is observable, then the same set of events also contains the 
elementary event which has the same two initial movements. If the measurable 
set contains uud, it also contains uuu. And if udu is an element of a measurable 
set, then this must also apply to udd. 


We had mentioned that o-algebras can be interpreted as information systems. 
Such an information system is constructed in a way that a decision-maker can 
distinguish precisely which upward or downward movements will have occurred 
up to t = 2. For example, at event {uuu, uud} the decision-maker is certain that two 
consecutive u-movements must have occurred, uncertainty however prevails with 
regard to the third movement. Similarly, at event {udu, udd}, the decision-maker is 
certain that up to tf = 2 there has been one upward and one downward movement, but 
he does not know what the third movement will be. So we can present information 
about what the first two movements were, but not which movement will follow 
next. Thus, the o-algebra contains the information we currently (t = 0) assume to 
have at t = 2, but not at time r = 3. The events which only differ in tf = 3 are 
always combined in each measurable set. To summarize: this o-algebra describes 
the information that a decision-maker today thinks he will have at t = 2. 


We will present a further example to reinforce this idea. 


Example 3.5 (Binomial Model) Let us continue with the previous example. How 
should a o-algebra be constructed in order to describe the information a decision- 
maker will likely have in t = 1? Let us look at event 


udu 


and assume that it is part of a measurable set. At t = 1 the decision-maker will only 
know whether the first movement was up (u) or down (d). If the first movement was 
u, in t = | the decision-maker cannot yet distinguish whether this event or one of 
the three other events (udd, uuu, or uud) have occurred. Any measurable set that 
contains udu must also contain the three other events. 


20Note that this is an event other than udu, although both paths lead to the same result at t = 3 as 
part of a recombining model. See the explanations on page 25. 
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Similarly, a set with event duu must also contain the three events dud, ddu, and 
ddd, because these four events are not yet distinguishable in t = 1. The generating 
sets of such a o-algebra are therefore 


Fi = |{uwu, uud, udu, udd}, (duu, dud, ddu, ddd}, = (3.30) 


The sign ... is to be understood as above. However, in this simple case only two 
sets are added, namely the empty set Ø and the total set Q. 


In comparing the last two examples a further reference can be made to the 
interpretation of a ø -algebra as an information system. While Example 3.5 describes 
the information available to a decision-maker at t = 1, Example 3.4 specifies the 
information that he currently believes to have at £ = 2. Obviously, the information 
becomes more comprehensive as time goes by. The second o-algebra at t = 2 is 
greater than the algebra at = 1. Thus 


FC. (3.31) 


It is also said that both o-algebras form a filtration. If one examines a binomial 
model with several points in time, a o-algebra can be formulated for each t, which 
describes the information available at * > 1 from today’s perspective. It can be 
stated that these algebras get “finer and finer,” 


AC RCR... (3.32) 


Economically, this corresponds to the idea that a decision-maker gains more and 
more knowledge over time and that no information is lost with passing time. 


An Infinite Number of Share Prices Consider the price of a stock at a future point 
in time and assume that the event space includes not only the options u and d, but the 
set of (nonnegative) real numbers, Q = R+. It is not easy to determine which events 
should be regarded as measurable. We will deal with this question in the following 
example. 


Example 3.6 (Share Price) For convenience we consider an event space containing 
all real numbers (and not only the nonnegative ones), i.e., Q = R. 

Proceeding in the same way as with natural numbers and assigning a positive 
probability to every conceivable value leads to a serious problem. Let’s assume that 
the German DAX is measured in real numbers and all values between 8000 and 
15,000 are possible. Let us further assume that we would like to model the DAX 
as a rectangular distribution. If every real number between 8000 and 15,000 had 
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the same positive probability, the sum of these probabilities would inevitably go to 
infinity and not to one. Even probabilities of zero do not avoid the problem, because 
these probabilities sum to zero and not to one. These conclusions remain valid even 
when other distributions are being used. 

For this reason we better not start with the requirement that the sets N and Q 
are measurable. But how should we proceed? If single numerical values must be 
unlikely, a sensible way to proceed is with intervals of numbers. As a first step 
we specify all closed intervals [a, b] for any real numbers a < b as measurable. 
Subsequently we examine which other sets are measurable if we apply the design 
rules from page 39 and proceed as follows: 


1. By letting a = b one knows immediately that the point sets {a}, which contain 
only the real number a, can be measured. 

2. If the set {a} is measurable, according to rule 2 its complement can also be 
measured. Thus, the sets R\{a} and R\{b} must be measurable. 

3. Consequently the open intervals (a, b) can also be measured. This follows from 
the fact that the intersection of R\{a} and R\{b} with [a, b] is the same as (a, b); 
and we had proven on page 39 that the intersection of any subsets is measurable. 

4. Since point sets are measurable, the set of all rational numbers, in short Q, must 
also be measurable, because it is a union of all point sets. 


All sets that can be generated with the construction mechanism used here are 
called Borel-measurable sets?! One particular characteristic of these sets is the 
fact that the open intervals can be measured.”” Based on rule 3 all sets are Borel- 
measurable which can be composed of a finite number of open intervals. A union of 
open intervals is also called an open set. An open set is characterized by the fact that 
not only point sets x but also all—however small—open intervals around x are part 
of the set. Open sets can be thought of sets without “borders” (such as the closed 
interval [0, 1]). 


To make matters more complex we will consider not only an infinite number of 
values of a share price but also its continuous development. Handling both elements 
is what the Brownian motion is all about. Let us now describe the underlying o- 
algebra. 


21 Félix Edouard Justin Emile Borel (1871-1956, French mathematician). 


>2Tt may be hard to imagine that there exist sets that are not Borel-measurable, nevertheless they 
can be constructed. However, the design specifications for such sets are highly complicated. 
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Fig. 3.5 Three elementary events in the event space C[0, oo) 


Example 3.7 (Share Price Evolution) With this example we will now approach 
the Brownian motion. Wiener?’ was the first to describe what a measurable set in 
C[0, 00) could look like.”* 

Since the set C[0, oo) has an infinite number of functions, characterizing the 
measurable sets is anything but a trivial task. One cannot expect the o-algebra to 
consist only of a finite number of functions. 

Constructive action has to be used again. In a first step, one describes specific 
measurable sets and in a second step one allows these initial sets to form their union 
or intersection. In the following we will concentrate on the first step, a task that is 
far from being elementary. The initial sets which are defined as measurable consist 
of the following continuous functions: 


First step (one point in time) We concentrate on one single point in time t > 0 
and two real numbers a < b. The measurable set defined in the first step includes 
all those functions with a value being exactly in the interval [a, b] at time t. This 
is illustrated in Fig. 3.5.7° 
At time f one can see a red vertical line running from a to b on the ordinate. 
You can recognize that two of the paths intersect this vertical line. The sinusoidal 
path, however, runs in such a way that it neither intersects nor touches the red 
line. Now one has to consider the set of a// continuous functions that go through 
the red line, i.e., 


Z = {f : function f is continuous on [0, T] and f(t) € [a, b]}. (3.33) 


23Norbert Wiener (1894-1964, American mathematician). It is often said that Wiener was the first 
to define what is now known as the Wiener measure. This is not entirely precise, because Wiener 
published his work in 1923, but the measurement theory was put on an axiomatic basis only in 1933 
by Kolmogoroff. However, Wiener described in his paper how to calculate measures of different 
sets of paths and is therefore with good reason called the founder of the stochastic theory of the 
Brownian motion. 

?4Tf you do not remember which event space we have designated with C[0, 00) see page 26. 


>The same three functions were shown in Fig. 2.6 on page 27. 
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f(s) 


> time 


Fig. 3.6 Cylinder sets with two fixed points in time 


The set Z being characterized by (3.33) is measurable; this property applies 
regardless of how the time ¢ and the limits a and b are chosen.”° 

Second step (two points in time) Now we are not looking at one but at two 
points in time with O < t < s. In addition to the numbers a and b two more 
numbers c < d are given. The measurable set that is defined in the second step 
includes the paths running through the interval [a, b] at t. However, there is 
another requirement which plays a central role when looking at the second point 
in time s. The development of the paths from our (to be defined) measurable set 
Z should not be arbitrary between ¢ and s; rather, the difference of the function 
values f(s) — f(t) must belong to the interval [c, d]. 
It is not easy to express this statement precisely: each measurable event should 
pass the interval [a, b] at time r, that is f(t) €e [a, b]. In addition, the 
relation f(s) — f(t) € [c, d] should apply for measurable events. This means 
the following: if, for example, the event f(t) = x would happen at ¢, then 
measurable events at time s should pass through the interval f(s) € [x+c, x+d]. 
The interval from which the function values originate is shifted with each value 
fH =x. 
With Fig.3.6 we try to illustrate this aspect. It should be understood that the 
position of the vertical line at time s depends on where the event passes the 
vertical line relevant for time t. In other words, the larger (smaller) x is, the 
higher (lower) the interval relevant at time s is located. Hence, we have not 
visualized all conceivable developments. Rather, we have limited ourselves to 
those developments which belong to a fixed value f(t) = x. In principle Fig. 3.6 
should be extended for each x € [a, b]. 
We must emphasize that the blue-shaded areas in Fig. 3.6 could lead to misin- 
terpretations because one could think that measurable events are restricted to 
the blue areas at all. Of course, one can imagine continuous functions that go 
through both vertical lines and still fall outside the blue areas. Functions with 


6Because the red line can be understood as a (very thin) cylinder such a set is often called cylinder 
set. 
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these properties are also measurable. However, at times t and s they must have 
function values that are specified in 


f@ €la, b] and f(s)— f(t) elc, d]. (3.34) 


Indeed, the blue areas between the times ¢ and s have neither an upper nor a lower 
bound. Now we can state that the set constructed in this way 


Z= { f : function f is continuous on [0, T] and 
f(t) ela, bl, f(s)—f@) €le, dl} (3.35) 


must also be measurable. This property should also apply regardless of how the 
times t, s and the four numbers a, b, c, d are selected. 

Next steps These constructions have to be repeated for three, four, and any 
number of other times. However, the number of these points in time always 
remains finite. The resulting sets of continuous functions are measurable. 


As stated in Sect.3.2, using these measurable sets one can form unions, 
intersections, and complements. 


Let us summarize. The measurable sets are obtained by a two-stage process. 
First, specific subsets (initial sets) are determined which should be measurable 
by definition. Additional measurable sets are formed with the help of the rules 
discussed above.”’ The resulting measurable sets may be different from each other 
depending on the initial sets which are chosen in the first step. 

The symbol F is used to denote the sets that form a o-algebra. The totality of 
basic sets, o-algebra, and measure u is called measure space (Q, F, u). 

Usually o-algebras are constructed as shown in our examples: we start with 
specific sets and add further sets by unions, intersections, and complements. It is 
stated that the o-algebra is generated by these specific sets. For example, if the 
generating sets can be described by the symbol X, one can write o (X). In the case of 
Borel-measurable sets we could use the notation o( la, bla, ber) for the o-algebra. 

—— 


=X 
3.5 Definition of a Measure 


After introducing measurable sets we will define what constitutes a measure and 
proceed in the same way as before. 


e As a first step the measure is specified for those sets which are directly 
measurable. 


27 See (3.2), (3.3), and (3.13) on page 30f. 
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« All sets which are not directly measurable can be obtained by union, intersection, 
and complement of directly measurable sets. In the case of finite unions we 
use the property of additivity given in Eq. (3.3) to determine the measure of 
such sets and in the case of infinite unions the property of o-additivity given 
in Eq. (3.13).78 


Hence, we can assign a unique number to each measurable set which we define as 
its measure. 


Definition 3.2 (o-Algebra, Measure) A measure is a mapping of a o-algebra F 
into the real numbers 


u: FOR. (3.36) 


The properties of nonnegativity according to (3.2), additivity according to (3.3), and 
o-additivity according to (3.13) are valid. 


Mind that we waive the property of shift invariance. Since we will not only look 
at probability measures, we allow for j4(Q2) # 1. On the following pages we discuss 
the construction of measures in the light of two examples (dice roll and, later, real 
numbers). 

Example 3.8 (Dice Roll) Let us start with the dice roll. In the following we will use 
a more appropriate notation. For the set of all scores which are possible with a dice 
roll, we can write 

Q = {1, 2,3, 4,5, 6}. 
The set of even numbers is written as Q° and the set of odd numbers (2°: 


QF = {2, 4, 6} and Q° = {1, 3, 5}. 


The matter is very simple at t = 1: the only information available is whether the 
score is even or odd. We stipulate 


1 
KR) = w(Q°) = 7 


Events such as {4}, {5}, {6} will not have their own measure because they cannot be 
measured. 


28 See page 33. 
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At t = 2 the elementary events are also measurable. In this case it makes sense 
to define the measure as follows: 
1 
u) =... = w({6}) = = 


None of the above is particularly remarkable. 


3.6 Stieltjes Measure 


The matter gets more interesting when we look at the Borel-measurable sets of 
the real line.” We had started the construction of the measurable sets with the 
closed intervals [a, b]. Let us consider a monotonously growing and differentiable*” 
function 


g: ROR. (3.37) 


Examples for such functions are g(a) = ef or g(a) = ®(a), where ®(a) is the 
distribution function of the standard normal distribution. We stipulate that 


u(la, b]) := g(b) — g(a) (3.38) 


applies. u is also referred to as the Stieltjes measure.*! It is obviously defined as a 
measure of closed intervals. 

In the following we show that we have therefore also defined the measure of the 
open intervals: from our considerations on page 46 we know that point sets and open 
intervals are also measurable. For point sets, it follows directly 


u({a} = g(a) — g(a) = 0, (3.39) 


implying they have measure zero. We can furthermore write a closed interval [a, b] 
as union {a} U (a, b) U {b}, where the three subsets are pairwise disjoint. Hence, 
because of property of a measure (3.3) 


gb) — g(a) = u(la, bI) = u(ta}) + u(la, b)) + u(tb)) 
= u((a, b)). (3.40) 


29 See page 47. 
30Tt should be noted that every differentiable function is continuous. 
3! Thomas Jean Stieltjes (1856-1894, Dutch mathematician). 
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g(a) = O(a) g(a) = e“ aaea 


Fig. 3.7 Stieltjes measures u ([%. “)) for different generating functions g(-) depending on b 


We recognize that the open interval has the same measure as the closed interval. It 
is easy to conclude that the half-open intervals [a, b) and (a, b] have the measure 


g(b) — g(a) too. 


Example 3.9 (Real Numbers) For three specific functions g we will characterize 
the resulting probabilities more precisely. For this purpose we first choose the 
function g(a) = (a), then the function g(a) = ef, and finally g(a) = a. To 
understand the measure applied over any range of the real line, we focus on the 
closed interval [—1, 1] and break it down into twenty subintervals. For each of the 
twenty subintervals we define a measure u(i at) with i = —10, —9,...,+9 
and plot function values. Figure 3.7 shows the effects that emerge with various 


functions g.”” These are our observations: 


« With g(a) = ®(a) the measure of the interval increases as i goes to zero. The 
entire real line has the measure 1. 

e With g(a) = e“ the measure of the interval increases as i grows. The entire real 
line has infinite measure. 

e With g(a) = a the measure of the interval does not change with i changing. The 
whole real line has infinite measure again. 


We inevitably determine the measure of a subinterval as the difference between the 
function values g (4c) — g (5) Therefore, it should come as no surprise that the 


figures look like the first derivatives of the respective measurement functions. 

In the case g(a) = a the result is also called Lebesgue measure and is denoted 
as à. It corresponds to our “common” perceptions of length units. In the other two 
cases the lengths are “weighted,” whereby the weight depends on where the interval 
to be measured is located on the real line. 


32]n case g(00) = 1 and g(-o0) = 0 we have a probability measure (the entire real line has the 
measure 1). The graphs would then reflect the densities. This corresponds to g(a) = ®(a). 
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3.7 Dirac Measure 


We will later revert to an admittedly degenerated probability measure where only the 
number a is highly probable. In fact it is certain! All other numbers are absolutely 
unlikely. This measure can be called degenerate because the numbers other than a 
are impossible. The reader will subsequently understand why such a measure can 
be important in the discussion about the Lebesgue integral. 

To formally define the Dirac measure we again look at the real line Q = R. For 
the fixed number a we use 


u([a, al) = 1, LU((—00, a)) = u ((a, œ)) = 0. (3.41) 


This measure is known as Dirac measure and is usually denoted by the letter ö..”° 


3.8 Null Sets and the Almost-Everywhere Property 


The sets N having no weight for a given measure u (i.e., u(N) = 0) are of special 
importance. Such sets are also called null sets. The complement N° = Q\N has 
full measure (which can be infinite if Q has no finite measure). 

Null sets play an important role. To understand this consider the function 


fa)=n withhn<x<n+l, n EN. (3.42) 


This function resembles a staircase that jumps up one unit at any natural number 
and remains constant between these numbers as shown in Fig. 3.8.°* The function 
f(x) has discontinuities in the places of the natural numbers and is otherwise 
piecewise constant. This phenomenon is anything but noteworthy. However, the 
discontinuities must not be ignored because they are responsible for the fact that 
certain mathematical operations (derivations, limits, etc.) cannot readily be applied. 
The staircase function is, however, recalcitrant and annoying. 

Null sets offer a mathematically precise way to deal with these annoying 
discontinuities.*> For this purpose we look at those points on the real line where 
f is discontinuous which is precisely the set of natural numbers N. Although there 
are infinitely many natural numbers, the entire set is rather small in comparison 
to the remaining real numbers. In order to cope with the problem we look at a 


33Paul Adrien Maurice Dirac (1902-1984, British physicist). 

34The graph directly indicates the name of this function, as it actually resembles a staircase. 
However, the representation is mathematically imprecise, because the function at x = 1,2,... 
obviously does not produce a single value but an interval of values. Of course, this is not allowed 
for functions. Technically speaking, the vertical lines in Fig. 3.8 should be removed. 

35]t should be emphasized that the discontinuities here are only one illustrative example against 
the background of which it is easy to discuss. We can also control other “unwanted” properties of 
a function with null sets. 


3.8 Null Sets and the Almost-Everywhere Property 55 


Fig. 3.8 Staircase function 


measure on the set of real numbers and try to take advantage of the described 
property. This is achieved by selecting the measure so that u(N) = 0. Obviously, 
the staircase function f(x) is constant outside the set N; discontinuities are only 
present in N, and this set has now measure zero. In our example, this permits the 
statement “The staircase function f is -almost everywhere continuous,” because 
the property (here: continuity of the function) applies everywhere except for the null 
set. The trick is not to deny unwanted properties of a function, but to ignore them 
by assigning them a measure that does not matter at all. 

If u were a probability measure, we would obviously ignore events that have 
measure zero. These are simply unlikely events. Our above statement would then 
read “The staircase function f is continuous except for unlikely events.” If u 
measures the weight of objects, we could state “The function f is continuous 
except for elements without mass.” Null sets do not attempt to deny the existence 
of disturbing properties of functions; rather null sets are used to disregard these 
characteristics. The staircase function remains discontinuous, but the discontinuities 
are unlikely, insignificant, without mass, etc., in short: a null set. We can state: 


Definition 3.3 A property applies u-almost everywhere*° exactly when it applies 
to all elements of the set N° = Q\N. 


Note that the choice of the measure plays a crucial role and it is very important 
which u is used. If two different measures jz; and u2 are defined, it is quite possible 
that one and the same function wı-almost everywhere is continuous, while this 
property is lost if u2 is selected. It is therefore important to choose the measure 
y skillfully. 

Please also note that null sets of a measure can be very large, indeed infinitely 
large. For example, it can be shown that the set of rational numbers is a null set 
when a Stieltjes measure is employed. To intuitively understand the implication one 
should imagine all rational numbers on a real line. If one adds a point to each of 
these fractions, “almost” the entire real line will be drawn: for each real number 


36Often, the term “f is u-almost everywhere continuous” is abbreviated by “f is jz-a.e. continu- 
ous.” 
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selected one can find infinitely many rational numbers which are arbitrarily close. 
Nevertheless, those numbers form a null set if a Stieltjes measure is used. Null sets 
can therefore be infinitely large and still have measure zero. 
Finally, we give four statements which apply almost everywhere under a specific 
measure. 
e The x? function is Lebesgue-almost everywhere positive.’ 
* Each and every number is Dirac 5,-almost everywhere equal to a.” 
e The absolute value function |x| can Lebesgue-almost everywhere be differenti- 
ated.”” 
e The staircase function in Fig. 3.8 can Lebesgue-almost everywhere be differenti- 
ated.*" 


8 


37x = 0 is the only point where the function is not positive. This set has Lebesgue measure zero. 
38 The numbers that do not equal a are given by the intervals (—00, a) and (a, 00). This is a very 
large set but its Dirac measure is nonetheless zero. 

3 The number where the function cannot be differentiated is x = 0. This set has the Lebesgue 
measure zero. 

0 The points at which the function cannot be differentiated are the set of natural numbers. This set 
has the Lebesgue measure zero. 
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Students of economics are confronted with random variables very early in their 
programs. They are confronted with this term not only in statistics and econometrics 
but practically in all economic subdisciplines, in particular in microeconomics and 
finance. The meaning of a random variable, however, remains somewhat vague. It 
is usually considered sufficient if students understand it to be data whose actual 
value is not guaranteed. However, we will not remain on the surface but provide 
more fundamental insights of random variables. The reader will learn that random 
variables are functions with specific properties. 


A Standard Example of Random Variables Assume that an experiment is carried 
out where the respective daily yields of both the S&P 500 index x1, .. . , x, and the 
Apple stock y;,..., Yn are determined on all trading days of a year.! A plot of the 
daily yields presented in pairs may help to support the assumption that there is a 
linear correlation between the yield of the Apple stock and the S&P 500. A model 
of the form 


yj =at Bx +e (4.1) 


is used to estimate the regression line with a and £ being the relevant parameters. It 
is commonly assumed that the interfering (noise) terms £; are independent of each 
other and have identical probability distributions. Typically, the interfering terms 
have an expected value of E[e;] = 0 and a variance Var[e;] = o°. If the noise is 
normally distributed, one usually writes £; ~ N(0, 07). 


While this depiction may not be a problem for most economic applications, it is 
far too simple for readers interested in a closer look at probability. 


'The (log) daily yield of a financial asset is easily determined with the help of In ( 


current rate 
previous rate J * 
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Referring to the above comments on the regression function one will notice 
that the interference terms follow a particular distribution but nothing is being 
said about the underlying state space Q. The state space was not even mentioned. 
Is it infinite? Does it include the real numbers or is it a larger set such as the 
space of the continuous functions C[0, 00)? What is the relation between the states 
w € Q and the realizations observed? One would probably call this relationship 
“causal,” because the state generated the realization which has occurred. While 
all these questions remain typically unanswered, at best the realizations with their 
probabilities are stated. 

In Eq. (4.1) there exists a random variable ¢;, but it remains absolutely unclear 
what the connection between “a random event” w € Q and the “random-driven 
variable” y; looks like. We are going to clarify this causal relationship in the 
following section. 


4.1 Random Variables as Functions 


We can certainly state that £; is influenced by “randomness” and can take different 
values. In order to express this relation formally, in a first step chance draws an 
arbitrary element w from the state space (2. Second, this state w then exerts a causal 
influence. The resulting quantity e(®) = £; should always be a real number. This 
allows us to use a random variable € as a function 


Ee: QR. (4.2) 
We will now illustrate the view of random variables with several examples. 


Example 4.1 (St. Petersburg Paradox) The St. Petersburg paradox is often dis- 
cussed in decision theory. The formalism we have presented so far is particularly 
useful to describe this game. 

Consider an experiment performed only once. The game master tosses a coin 
until “heads” appears. The payment to the participant is given in Table 4.1 and 
depends on the number of tosses required to obtain “heads” for the first time. 
Although the expected value of the payment is infinite,* hardly anyone is willing 
to sacrifice more than $10 to participate in the game. 

A binomial model is used successfully in order to describe this game formally. 
Heads are represented by u and tails by d. An elementary event is a sequence of 


With a fair coin the two events “heads” and “tails” are equally probable. Then the expected 
payment of the game is 
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Table 4.1 St. Petersburg First “heads” 


Payment &(w) 
paradox payment 2 


After 1 toss 2) =2 
After 2 tosses | 2? = 4 
After 3 tosses i 2 = 8 


Never 0 


tosses, i.e., an element? 
N 
weflu,d}. (4.3) 


If one wants to determine the number of tosses necessary for the game to end, 
it is the natural number associated with the first u in state w. Since it is at least 
conceivable that no tails will ever appear, one must differentiate two cases by 
defining the following function: 


k, 3keN d = w1 =...=k-1, U = Wk, 
g (œ) := (4.4) 
0, else. 


The payment in dollars is calculated as follows: 
payment & (œ) = 28), (4.5) 


Example 4.2 (Dice Roll) We roll a dice and note that the payment is double the 
score. In this case the random variable corresponds to the payment in dollars and 
can be described as 


elw) = 2-0. (4.6) 
œ varies from 1,..., 6. Let us now discuss a more difficult example. 


Example 4.3 (Continuous-Time Stock Prices) The state space consisting of the set 
of all continuous functions 2 = C[0, oo) is required for the construction of the 
Brownian motion. This state space is the natural candidate for considering stock 
prices that vary continuously in time. Every elementary event œw € Q is a function 
of real numbers. Hence, we can also write w(t) : R — R with r being time. 

If we want to construct a random variable for the event space Q, we must 
determine the real number which is generated by an event w. The value of the 


3Instead of {u, d}" it is possible to write also {u, d}™. 
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random variable w(t) (“effect”) is the realization of the event w (“cause”) at a 
predetermined time t. The random variable is denoted by 


elw) := w(t). (4.7) 
Obviously, it is the value of one of an infinite number of functions w € Q at time t. 


Example 4.4 Instead of focusing on a single point in time we are interested in the 
average of all values of the function @(-) in the interval [0, t]. In other words we 
are not restricting ourselves to the value of the elementary event at time ¢ but are 
interested in the average of a finite time interval. This random variable would be 
defined in the form 


t 
elw) := =f w(s)ds. (4.8) 
t Jo 


4.2 Random Variables as Measurable Functions 


Not every function is a random variable. There are two classes of functions, those 
that are random variables and those that are not. In order for functions to be called 
random variables, they must have a certain property which will be discussed on 
page 65. As a prerequisite to that discussion one has to understand why we look at 
random variables at all. 

In dealing with random variables we are primarily interested in their probabil- 
ities. However, assigning probabilities to realizations of random variables is not 
always an easy task. In the following two examples we show initially the case where 
the assignment of probabilities does not create any difficulties and subsequently 
where it will. 


Example 4.5 (Dice, Ideal and Manipulated) In the case of a dice roll one can assign 
to each realization a corresponding probability, regardless whether the dice is ideal 
or manipulated. In doing so the inverse function £7! would have to be considered 
and we can determine the probability of the outcome. Formally this would be a € R 
for a specific realization, so 


ue Ka)):R > [0,1]. (4.9) 


To illustrate the above let us assume that the payment after a roll is double the score. 
Our mapping would be the same as in Example 4.2 on page 61 


e(w) = 2-0. (4.10) 


4.2 Random Variables as Measurable Functions 63 


Table 4.2 Probabilities of 


” Dice\score 
two dice = 


l- 
1 
Ideal Z 
= 


Manipulated | 75 


With respect to this random variable we can specify the probabilities directly: 
since the inverse function e~!(a) = 5 exists, the corresponding probability can 
be calculated easily. For each a = 1, ..., 6 the probability amounts exactly to 2. 

With a manipulated dice, for example a score of six would be rolled with a higher 
probability than with an ideal dice, this manipulation would be reflected in the 
function &(w). The inverse function ¢~!(6) would not return é but a higher value; 
correspondingly the other scores must have lower probabilities. 

To grasp this, imagine two different dice, an ideal dice and one being manipu- 
lated. With the manipulated dice the occurrence of a score of six is twice as likely as 
with the ideal dice. Since we can clearly assign a probability to each realization of 
both dice according to the following table, the dice are clearly distinguishable from 
each other (Table 4.2). 

Given the payment rule (4.10) it is easy to conclude which dice is rolled: if the 
ideal dice is rolled over and over again, the payment of $12 (equivalent to a score of 
6) will occur as often as a payment of $6 (equivalent to a score of 3); if however the 
manipulated dice is rolled, $12 are paid out much more often than $6. 


As shown in the following example matters are not always as simple as illustrated 
above. 


Example 4.6 Let the state space cover the set of all real numbers, Q = R. For any 
real number drawn by chance, the payment shall again be twice the real number 
as postulated in Eq. (4.10), i.e., e(@) = 2w. All we need to do now is to specify 
how we will measure the probability of an event in R. For this purpose we use the 
Stieltjes measure u introduced above,* leaving the actual function g unspecified for 
the moment. 

Constructing the inverse function as in (4.9) and measuring the probability, we 
obtain an extremely unsatisfactory result. If the state 5 occurs the payment of a € R 
will result. The probability that this will be the case can be determined directly: 
it is simply zero because w([$, 5]) = 8(5) — 8(5) = 0. This result is entirely 
independent of the function g chosen. One must realize that a different procedure is 
required. 

With the dice roll example the probabilities of the payments are always positive 
and allow us to determine whether the ideal or the manipulated dice was rolled. The 
probability of a score of 6 points and a payout of $12 is significantly higher when 
rolling the manipulated dice. 


4See page 52. 
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With the real number example, however, we cannot achieve a similar result 
because the probability of a payout is always zero, regardless of whether one uses 
the function gı (analog to the ideal dice) or g2 (analog to the manipulated dice). 

The solution to the problem is not to focus on a particular realization, but on an 
interval of realizations. We no longer ask which state results exactly in the value a, 
rather we ask which states will deliver realizations between the values b and a with 
b < a. This leads to a meaningful result. We have to ask when the state w returns a 
value from the interval [b, a]. Hence 


n(e (lb, a])) = u {œw : &(w) € [b, al} 
= ulw:2-.welb, al} 


ee] 


= utlb, al} 


=«($)-«(5). (4.11) 


Obviously the particular function g has a direct influence on the probability that the 
realization of the random variable lies in the interval [b, a]. 


However, our proposal also has a weakness. The probability that a realization 
œ will fall in the interval [b, a] depends on two variables b and a. This is a 
multidimensional function, and functions like these are always difficult to handle. 
It makes sense to standardize the first variable b, and b — —oo has proven to be 
useful. It is a common practice to omit the equal sign in —oco < &(w) < a. This 
finally gives us the definition used nowadays to characterize a random variable. 


Using a random variable will answer the question: what is the probability of 
an event leading to a realization being less than a? 
For each random variable £, we are considering the probability 
ulo: elw) <a}) . (4.12) 


This function, depending on a, is called distribution function of £. However, we still 
have to make sure that the set M := {w : &(®) < a} is measurable. 
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Definition 4.1 (Random Variables) A function € is called a random variable if 
for each real number a the event 


F.(a) = {wE Q : elw) <a} (4.13) 


is measurable. Random variables are therefore also called measurable functions. 
F(a) is the distribution function of the random variable €. 


The definition of the distribution function allows us to establish something 
similar to “probabilities for certain realizations.” The derivative F’ exists if the 
distribution function is differentiable. This derivative can be interpreted as the 
“weight” of the distribution function in the neighborhood of a, because 


F(a+h) — F(a) = F'(a)-h (4.14) 


applies in linear approximation. The probability of a realization of the random 
variable in the interval (a, a + h) can be approximated by the product F’(a) - h. 
Remember the probability of exactly realizing a is zero. But if you depart from the 
point value to a linear estimation of a sufficiently small interval, you obtain—for 
differentiable distribution functions—a variable that is easy to interpret. F’ is called 
density function. 

Let us point out some facts in the context of random variables. From common 
analysis one knows: adding, subtracting, or multiplying continuous functions 
result in functions which remain continuous. It is useful to know whether this 
property holds also for measurable functions (i.e., random variables). The following 
proposition provides the answer.” 


Proposition 4.1 (Properties of Random Variables) /f X and Y are random 
variables, then the sum X + Y, the product X - Y, and the ratio 7 (with Y + 0) are 
also random variables. 


For the purpose of brevity we omit a proof. 
For enhancing the understanding of random variables three additional examples 
will be presented. 


Example 4.7 (Dice Roll) We refer to the dice roll example of page 41 and define 
the following payout function depending on the score, 


100, ifw = 1,3, 5; 
f(@) := 4 200, if w = 4; 
0, else. 


5We will see later: for Riemann-integrable functions analog relations apply. For example, if X and 
Y can be integrated, then this holds also for the sum of the functions. 
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Because of the relationship 
{w : f(@) < 200} = {1,2,3,5,6} = Q\{4}, 


the function f is not F1-measurable since this set does not belong to the o-algebra 
Fı. At time t = 1, the function f is therefore not a random variable. Based on the 
knowledge available at time tf = 1 it is not possible to decide how high the payout 
associated with f will be. One learns only at time tf = 2, whether event {4} or event 
{2, 6} has occurred. 

Since the o-algebra F2 includes all subsets of the possible number of points, 
the payout function is F2-measurable. Thus, the function f represents a random 
variable at? = 2. Now you can see if a 4 or other even number or any other number 
has been rolled. 


Example 4.8 Using the same dice roll example we will now consider how a function 
must be constructed to be a random variable at time t = 1. Intuitively, the answer 
is clear: since one can only distinguish odd and even scores at this point in time, a 
function is only measurable if it returns identical payouts for all even and all odd 
scores respectively. Thus, a function of the form 


o, atodd scores w, 
f(@)= (4.15) 


e, ateven scores w, 


with o Æ e will ensure that f is measurable at t = 1.° 

This intuition-based statement will now be proven formally for o > e. To do so 
we have to show that the set M := {œw : f(@) < a} is measurable for any real 
number a. With a given a we can distinguish four conceivable cases.’ 


Case l:a <oanda <e Since both the even and odd numbers provide function 
values above a, the set remains empty, M = Ø. This set can be measured 
according to the above stated prerequisite. 

Case 2:a < o anda >e. With this choice of a, the set M captures the even 
scores. The odd scores do not belong to M. This set can also be measured. 

Case 3:a >oanda < e. Only the odd scores are captured in M, while the even 
scores do not belong to M. This set is measurable. 

Case 4:a > oanda >e. Both even and odd numbers return function values 
below a. Therefore, all even and all odd numbers are in the set M, which 
therefore includes all rolls. This set is also measurable. 


The reader is encouraged to explore other functions capable of guaranteeing measurability. 


7Depending on the numbers u and g, one of the four cases identified may not even occur. For 
example, if g = 0 and u = 1, there is no a to satisfy case 3. 
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The set M is measurable in all conceivable cases. Therefore, it can be stated that the 
function f is measurable and thus represents a random variable. 


Example 4.9 We consider the Borel-measurable sets on the real line and look for 
functions f :R — R that are random variables. According to our definition this 
implies that the set of 


A:={x : f(x) <a} (4.16) 


must be measurable or, as one might say, belongs to the Borel-o-algebra. 

Restricting ourselves to continuous functions f implies: if a point x belongs to 
the set A, i.e., the function f(x) < a is valid, then there exists also a (possibly 
small) interval of x in A. Given continuity it follows that f (x +ô) < a also applies. 
Hence, the set A is an open set and thus Borel-measurable.® We can summarize: all 
continuous functions are random variables; however, the existence of further Borel- 
measurable functions is not excluded. 


4.3 Distribution Functions 


Random variables are measurable functions and are also called distribution func- 
tions. Describing such functions in full for a specific case can be very time- 
consuming. In order to get at least a rough idea of a distribution function it is 
common to characterize it by its moments. 

The most important moment is the expectation? which can be illustrated by 
returning to Example 4.2 from page 61. We are interested in the payout a participant 
could realize on average if this game was played very often (strictly speaking: 
infinitely often). The amount in question is calculated by weighting the random 
payouts with their probabilities and adding them over all conceivable states. Hence, 


E-F xw-1=@+44... 412-227 (4.17) 
o=1 6 6 


The average payment of this game therefore amounts to $7. The distribution 
function in our dice example is very straightforward. 

Unfortunately, determining expected values is not always easy. Calculating the 
expected value is far more difficult when dealing with a random variable X, for 
which X : Q — R applies. Since the number of possible realizations is infinitely 
large and the probability of a specific realization is zero, an integral replaces the 
sum. 


8See page 47. 
Further ratios are addressed in standard textbooks on statistics, for example Mood et al. (1974), 
chapter 4, or Hogg et al. (2013, p. 52 ff). 


68 4 Random Variables 


With the Brownian motion we are dealing with the state space Q = C[O, ©), 
i.e., the set of all continuous functions starting at zero. However, anyone wanting 
to determine the expected value quickly gets into considerable difficulties with the 
Riemann integral known from high school mathematics. In the following chapter 
we will show the reason for these difficulties and how they can be overcome. 
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5.1 Definition of Expectation: A Problem 


In the previous chapter we dealt with the concept of probability in the context of any 
event space (2. We described how to proceed appropriately to define a probability as 
a measure of a set. Now we are focusing on the determination of expectations and 
variances. 

Why is the calculation of expected values a problem at all? Let us continue with 
the example of the dice roll. There are six possible states and to each of them we 
can assign a random variable X (w). The expectation of these random variables can 
now be determined very easily by multiplying each realization by the probability of 
their occurrence and then adding the six values, 


6 
1 
E[X] := = X(o)- 7A (5.1) 
w=1 


Calculating the expectation gets more complicated when dealing with a larger state 
space like the real numbers, Q = R. A summation of the form 


= (5.2) 


xeR 


simply will not work since the real numbers cannot be enumerated exhaustively. ! 
The summation rule does not make any sense. 

One might be inclined to use the Riemann? integral as a sensible alternative. 
Before realizing that this does not work either, we will discuss the construction of 


lFor an explanation see page 106 f. 
Georg Friedrich Bernhard Riemann (1826-1866, German mathematician). 
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the Riemann integral in necessary detail. For ease of presentation we will restrict 
the discussion to strictly monotonously growing functions over the real numbers, 


f:ROR. (5.3) 


5.2 Riemann Integral 


The definite Riemann integral over the interval [a, b] is constructed by splitting this 
interval into many small subintervals [7;, ti+1]. The 7 index runs from į = | toi = n. 
A rectangle with a width of t;+1 — ti is placed over each subinterval. Several options 
exist for selecting the height of such a rectangle: you can use the lower function 
value f (¢;), the upper function value f(#;+1) or any value in between, that is f (t;*) 
with t; < tř < ti+1. When using the left function values in determining the area of 
each rectangle and adding all rectangles, we obtain their lower sum: 


lower sum, =) f(t) (ir — ti). (5.4) 


i=] 


If we use the right function values, we obtain the upper sum: 


upper sum, = È f (ti+1) (41 — ti). (5.5) 


i=l 


If the integral shall represent the area below the function, the upper sum (for a 
monotonically growing function) will be larger and the lower sum will be smaller 
than the area below the function. If one allows the decomposition to become ever 
finer (n — oo), it all depends on how the two sums will behave. Riemann succeeded 
in proving that the choice of the function value for certain functions is irrelevant. 
Regardless of which function value is selected, the resulting sum of all rectangles 
converges to the same value if the number of subintervals n goes to infinity.? This 
result is known as the Riemann integral 


now 


b 
f f(t)dt := lim upper sum, = lim lower sum,. (5.6) 
a n—> o0 


Figure 5.1 illustrates the process of constructing the Riemann integral for the case 
of a triple, a sixfold, and finally an infinite segmentation of the interval [a, b]. 


3This applies, for example, to continuous functions. Meanwhile, a function is defined as Riemann 
integrable exactly when upper and lower sum converge to each other at any decomposition. 
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Fig. 5.1 Illustration of the upper sums of the Riemann integral with three, six, and infinitely many 
subintervals 


In order to apply the Riemann integral, certain requirements must be met. In 
particular, the definition range of the function to be integrated must be a closed 
interval from the set of real numbers, because no other interval can be divided into 
an infinite number of subintervals. The function f (œw) of the dice roll example from 
page 65 has no such definition range. 


5.3 Lebesgue Integral 


The state space Q = R is a very special prerequisite. How should the idea of 
integration be applied to a situation where the definition range of a function does not 
cover a closed interval of real numbers? Earlier we have pointed out that state spaces 
other than those covering the real numbers do exist.* The state space Q = C[0, 00) 
includes all continuous functions starting at zero. An important question must be 
addressed. How should this set be divided into equal-sized subintervals? We can 
order real numbers by their value and thus form intervals; with continuous functions, 
however, such a procedure is not possible. Since the Riemann integral cannot be 
used we must explore a different way of calculating the expected value of random 
variables over C[0, ©). 

The French mathematician Lebesgue had the ingenious idea of how to proceed. 
He suggested splitting the ordinate into subintervals rather than the abscissa. 
Regardless of the characteristics of the state space Q each corresponding function 
maps into the real numbers. The specific segmentation of the ordinate, however, 
depends on the actual random variable. The procedure is described below and 
illustrated in Fig. 5.2 for the same function used in Fig. 5.1. 

With Lebesgue integration the ordinate is split into subintervals. In Fig. 5.2 we 
divide the interval [ fo, f3) into three subintervals [ fo, f1), Lf, f2), and [fo, fs). 
Doing so allows us to identify three subsets on the abscissa. The subsets Aj, A2, 


4See page 27. 
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f(t) 
ft) 
A 
fi 
FA) 
Q 
t 
“A A A&b 
Fig. 5.2 Lebesgue integral: sets of inverse images of a function to be integrated f(t) 
and A3 result in principle from the inverse images of the function f, thus 
Ai = f Ufi- fD), = 1,2,3. (5.7) 


As shown in Fig. 5.2 the interval [ f2, f3) of the ordinate is assigned the inverse 
image A3 on the abscissa. Similarly, the intervals [ fi, fo) and [ fo, f1) map into 
the inverse images Az and A, respectively. In order to be able to integrate, the 
subsets represented by the inverse images must be measurable, i.e., come from the 
o-algebra. 

We want to understand the implication for the f(-) function. Will every arbitrary 
function be integrable using this idea? If we divide the ordinate into subintervals, 
the corresponding subsets are automatically created on the abscissa. If the interval 
[fi—1, fi) is part of a segmentation, the corresponding subset on the abscissa is 
defined by 


Aj: ={@: fi-1 < f@) < fi} 
={o: fw) < fjNQ\{o: fæ) < fi}. (5.8) 


It is therefore sufficient to require that the inverse image {w : f(w) < a} of the 
f() function is a measurable set for all real numbers a.” Note that this property 
defines a measurable function.° Therefore, every measurable function is Lebesgue 
integrable. 


SIf the set {œw : f(@) < a} is measurable, then this also applies to the complement Q \ {œ : 
f(w) < a} as well as to the intersection of this subset with another measurable subset. Otherwise 
we would not have a o-algebra. 


See Definition 4.1 on page 65. 
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Fig. 5.3 Lebesgue integral: f(t) 
illustration of the upper sum 4 


After these considerations we can present the idea of Lebesgues in its entirety. 
Analog to the Riemann integral (which measures the area under a function) we will 
approximate this area by using upper and lower sums again. If a function f(-) is 
measurable the integral can be approximated by the “upper sum” 


upper sum := fi - u(A1) + f2: (A2) + fa u(A3). (5.9) 


First we realize why this expression is always greater than the value of the integral 
and therefore represents a first approximation of the area like an upper sum. For this 
purpose we redraw Fig. 5.2 and focus on the rectangles of the approximation fi - 
u(A1),..., f3- (A3) from (5.9) leading to Fig. 5.3. Two rectangles are highlighted. 
They have the widths u(Aı) and u(A3) and the heights fı and f3 respectively. 
The sum of these rectangles overstates the area because the function runs below the 
upper corners of the rectangles. The area we have determined by using the upper 
sum in expression (5.9) is greater than the integral. We can construct a lower sum in 
analogy 


lower sum := fo: (A1) + fi: u(A2) + fo: u (A3). (5.10) 


Let us suppose that the two sums converge against the same value in the limit 
when the subintervals on the ordinate get infinitely small. If the two limits converge 
to the same value for any segmentation of the ordinate, the function f(-) is called 
Lebesgue integrable. This value is the Lebesgue integral of the function and usually 
written in the form 


/ ranei, 61D 
Q 
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Fig. 5.4 Interaction of set Q 

measurement and function in 

the Lebesgue integral element w € Q subset f([x,x+dx])) CQ 
function f measure u 


Note that in the Lebesgue integral both the f function and the measure u refer 
to the basic set Q , however, in a different way. The function f (œw) assigns a real 
number to each element of Q. However, we must proceed differently when dealing 
with the measure du (œ): a subset Jl ([x,x + dx]) C Q and not a single element 
is assigned as illustrated in Fig. 5.4. 

The calculation rules for the Lebesgue integral are quite similar to the Riemann 
integral. These rules are: 


Proposition 5.1 (Calculation Rules for Lebesgue Integrals) For Lebesgue inte- 
grable functions f and g the following applies 


f| ata du= | fan+ | gan (5.12) 
Q Q Q 


Je ran=a | ran. VaeR. (5.13) 
Q Q 


Applying these rules the integral over f cannot be smaller than the integral over 
gif f(x) > g(x) forall x € Q. To prove this claim we first look at the rule 


f tau- [san= | f-gdu. (5.14) 
Q Q Q 


The difference f — g is nonnegative according to the assumption f(x) > g(x). 

However, it follows from the construction of the Lebesgue integral that we multiply 

and add these nonnegative differences by nonnegative values of the measure. The 

integral of the difference must not be negative, and this is what we have asserted. 
Let us illustrate the Lebesgue integrability using three examples. 


Example 5.1 (Dirichlet Function) We consider the so-called Dirichlet function’ 


D(x) = | 1, if x rational; (5.15) 


0, else. 


7Peter Gustav Lejeune Dirichlet (1805-1859, German mathematician). 
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We are interested in the value the Lebesgue integral has over a Stieltjes measure. 
We only assume u(Q) = 0 and g(1) = 1. The first property is always true with any 
Stieltjes measure as shown earlier. 

To calculate the integral we divide the ordinate into the following five subinter- 
vals® 


(oo, 1), [1, 1], (1,0), [0, 0], (0, —co). 
oe ee eee 
ff) fl (PP) Bl (fs, fa) 


These subintervals have inverse images of the Dirichlet function on the definition 
area R, which we designate as A, to As: 


Ai = f'(@,1)), A2= f! (0,1), As = f77(0,9), 
A4 = f™'([0,0]), As = f'((0,-)). 


It is obvious that no function value exists in the first, third, and fifth subintervals. 
The corresponding images are empty, their measure is zero, 


Ai = 43 =4A5s=9 > u(Aı) = u(A3) = (As) = 0. 


Let us focus initially on the second and fourth intervals only. The Dirichlet 
function is constructed such that all rational numbers Q are contained in the inverse 
image A2, while all irrational numbers R \ Q are contained in A4. 

In order to determine the Lebesgue integral we calculate the upper sums. 
According to (5.10) the upper sums are 


Í D(x)du(x) < upper sum 
R 


= lim nali) +1- u(A2) + uA) +0- ula) + 0- (As) 


1u(Q) +0- uR \ Q). (5.16) 


Each Stieltjes measure of rational numbers is zero (see page 55). From this it follows 
that the Stieltjes measure of the irrational numbers is one” resulting in 


f D(x) du(x) < 0. (5.17) 
R 


8 This procedure is not quite correct, since we should break down the ordinate into half-open 
subintervals and actually not a single interval fulfills this characteristic. It is therefore not readily 
clear whether the inverse images are measurable sets at all. In our case, however, this does not lead 
to a problem, which is why we consider our approach to be appropriate. 

°The interval [0, 1] has the Stieltjes measure one (remember g(1) = 1). Since the rational numbers 
are countable, they have measure zero. The difference between [0, 1] and the rational numbers, 
i.e., the irrational numbers, must therefore have a measure of zero. 
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Similarly, we can determine the lower sums obtaining 


/ D(x) du(x) > lower sum 
R 


=1-p(A1)+ 1- w(A2) + 0- u(A3) +0: (Au) 
+ lim n- u(As) 
n——0o 


=1-4Q+0-vR\® 
=0. (5.18) 


Hence, the Lebesgue integral of the Dirichlet function is zero. 


This example is interesting for the following reason. Suppose we want to 
determine the classic Riemann integral f. z D(x)dx. We would have to construct 
upper and lower sums on the interval [a, b] which include the function D(x). 
Regardless of how we break down the abscissa, the following always applies: even 
in any subinterval of [a, b] both rational and irrational numbers exist. Thus the 
upper sum of the Riemann integral is always one, and the lower sum is always zero. 
Thus, the Dirichlet function cannot be Riemann integrated because the two sums 
do not converge against a common value. Our example illustrates the point that 
the Lebesgue integral can be used in situations where the Riemann integral cannot. 
Thus, the Lebesgue integral is far more powerful. 


Example 5.2 (Power of the Lebesgue Integral) In the previous example we had 
used an arbitrary Stieltjes measure u with g(1) = 1 and considered the particular 
Dirichlet function D. 

Now f will be an arbitrary function with a particular measure u = ôa, the Dirac 
measure. 

We calculate the integral of an arbitrary function over the Dirac measure ôq, 


f f(@) dôa. (5.19) 
Q 


The Dirac measure of the set Q \ {a} is zero. Therefore, it is meaningful to divide 
the ordinate into three subintervals, !0 


Ordinate = {(—00, f(a))} U {f (a)} U {(f (a), o0)}. (5.20) 


10The middle subinterval is a closed one again. In Footnote 8 we had already pointed out that it is 
not strictly allowed to do this, but we do this for convenience. 
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Concentrating on the upper sum we obtain 


Í f(@)déq < upper sum 
Q 


= f(a): ĉa (w E Q : fw) < f@)) 
+ f(a) -ôa (wen: flo) = f(a)}) 
+ lim n du (toe Q: flo) > f(a)) 


= f(a): a (Ø) + f(a) ôa (Q) + lim n- ôa Ø). (5.21) 


While the measure in the first and third terms is zero, the measure in the second term 
is one. This leads to Jo f(@) dba < f(a). 
Analog to the lower sum is 


f f(w)dö. > lower sum 
Q 
= lim n-ba(loe 2: fo) < fa) 


+ fa) da (fo € : flo) = fla)) 
+ fa) da (lo € : fw) > fla)). (5.22) 


This leads to fo f(w) dôa > f(a). 
Therefore, the Lebesgue integral equals the function value at a 


i f(@) dda = f(a). (5.23) 


The above result cannot be obtained with a Riemann integral. There is no 
function g(x) such that 


f fa)gx)dx = f(a) (5.24) 


for any arbitrary function f(-) and any arbitrary number a: the function g would 
have to be infinite at a and zero otherwise. 

Illustrating this we consider a function g, (x) that has value n in the neighborhood 
of a. Outside the neighborhood the function has value zero. The neighborhood 


corresponds to the interval (a _ x, 

with increasing n. (Why we choose exactly this and no other neighborhood will 

become clear soon.) Figure 5.5 shows the typical course of such a function gy (x). 
Let us integrate the product of f(x) and g„(x). Because the product of both 


functions outside the neighborhood of a is zero, we can ignore this part of the 


a+ x) which is getting smaller and smaller 
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ERBE 


Fig. 5.5 Futile attempt to construct a Lebesgue integral with the Dirac measure using a Riemann 
integral 


integral. From Fig. 5.5 we know the value of g„(x) in the neighborhood of a. This 
gives us 


ee) att 
f f(x); 8n(x)dx = / ` f(x) - ndx. (5.25) 


~ On 


Using the mean value theorem of integral calculation we can determine this 
integral more easily. As long as n is finite the integral corresponds approximately 
to the product of a value of f (a) - n and the length of the interval x. This product 


equals f (a)n = = f(a). If n tends to infinity the integral converges to f(a). 
We conclude: anyone trying to achieve a result of the form 


f(a) = f(x) g(x) dx (5.26) 


with classic Riemann integration must use a function g(x) which is zero outside 
a and assumes the value “infinite” at a. However, such functions do not exist in 
classical analysis.'! On the contrary the result 


f(a) = f f(@) duo) (5.27) 


can be obtained for any f using the Dirac measure u = dg. This once again shows 
the power of Lebesgues’ integration concept. 


Example 5.3 (Lebesgue and Riemann Integral Give Identical Results) In Exam- 
ples 5.1 and 5.2 we showed that a Lebesgue integral is applicable in situations where 


1 Functions are unique mappings into real numbers, and infinity is not a real number. 
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a Riemann integral is not. In this example we will show that under certain conditions 
a Lebesgue integral delivers a result which is identical to a Riemann integral. 

We consider a strictly monotonous function!” f(x) over the interval [0, 1] and 
want to calculate the Lebesgue integral Jio, 1 f(x)du(x). The measure u is Stieltjes 
generated by the differentiable and strictly monotonous function g(x). 

Due to the strict monotonicity the function value lies in the closed interval 
[f (0), f(1)] which we divide into n subintervals. It makes sense to use the 


subintervals [ f(4).f ()) with the index i running from 0 to n — 1. We 


can determine the inverse image areas of these subintervals. Due to the strict 
monotonicity of f the inverse function exists and the following applies: 


LO) om 


Looking at the lower sums of the Lebesgue integral and letting n go to infinity we 
get 


n—1 


fa f(x)du(x) = „lim rn yu ([ż =). (5.29) 


The Stieltjes measure of this interval is determined by Eq. (3.38).!? Therefore we 


ha F(x) du) = lim rl: )(: ()-:()) (5.30) 


We rewrite this equation in a slightly more complicated form which will turn out to 
be suitable in a moment 


(m) ein iei i 
n n 


=z 


The term marked z for n — oo corresponds to the first derivative of g which leads 


to 
I, f(x) dp(x) = lim rit Je 6) =- £). (5.32) 


12The following remarks also apply to non-monotonous functions f. Then, however, the proofs 
are more complicated. 


13 See page 52. 
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The right expression is the classic Riemann integral Jo f- g dx. 
Therefore the following holds: 


1 
i fo) du) = f f-gidx. (5.33) 
[0,1] 0 
Lebesgue Riemann 


To summarize: the Lebesgue integral with Stieltjes measures is a generalization of 
a Riemann integral. 

5.4 Result: Expectation and Variance as Lebesgue Integral 

On the basis of the material presented in the previous sections we are able to 
define the expectation and the variance of a random variable Z—even if the state 
space does not correspond to the real numbers. The expectation and variance are 


Lebesgue integrals over the probability measure of the state space Q. Specifically, 
the following applies 


E[Z] a Z(w) du(a), (5.34) 
Q 
Var[ Z] := Í (Z(w) — EIZ] du(o). (5.35) 
Also, the following applies: 


Var[Z] = | (2*0) - 2E[Z]Zw) +E*[Z}) dulo) 
Q 
=E[Z] =u(Q)=1 
ooo 


m 
=] Z*o) duo) — 28121 | Zw duto) +Z] | ano) 
Q Q Q 
=f Z?(w) du(w) — EIZ]. (5.36) 
Q 


This is known as the decomposition theorem of variance which could also be written 
more concisely as Var[Z] = EIZ?] — EIZ]. 


5.5 Conditional Expectation 
In the previous section we have shown the process of determining the expectation 


of a random variable using the Lebesgue integral. In doing so we have, however, 
ignored an aspect which plays a major role in financial problems. Analyzing an 
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investment decision requires the evaluation of future cash flows that will occur over 
a period of several years t = 1,2,.... 

In particular we assume that the investment decision must be made today (in 
t = 0) and cannot or should not be revised. Given these starting conditions the 
future cash flows the decision-maker currently expects to occur int = 1, 2,... must 
enter the evaluation process. The expected values of these cash flows are called 
“classical” or “unconditional expectations.” 

Let us now change the perspective of the decision-maker: the decision-maker 
is interested from the very beginning in a flexible investment plan, i.e., he is 
considering also possible modifications of the original investment decision. For 
example, this could include that the decision-maker can build int = O either a 
larger or a smaller production facility. At £ = 1 there should also be the possibility 
to expand a smaller factory or to abandon the investment. 

Once t = 1 has occurred the decision-maker will have newer and different 
information about the probabilities of future cash flows than he had in t = 0. Today 
he can only decide on the basis of the information available at £ = 0. Thus, the 
decision-maker can at best consider those future cash flows which he believes in 
t = 0 will be realized in later periods given that certain conditions will take place. 
From today’s perspective the future cash flow developments in t > 1 could either 
be influenced by a boom or a bust. Such state-dependent expectations are called 
“conditional expectations.” It is therefore very important to distinguish between 
unconditional and conditional expectations and being aware of their implications. 


Conditional Expectations Regarding an Event Let us clarify what distinguishes 
a conditional expectation from an unconditional one. In general, the expectation 
of a random variable is the weighted average of all possible states with the 
weights representing their probabilities. The expectation describes something like 
the average result of a random variable. Of course, you need certain information 
about future events to be able to calculate expectations. Therefore, we must look 
more closely at this information. 

The information a decision-maker has available can be described in more detail 
using the o-algebra F as shown in Sect. 3.2. Given the measure space (Q, F, u) 
we know the event space Q, the measurable set F, and the probability measure u. 
Considering subset A C Q we assume that this subset is measurable (A € F). In 
other words, it can be determined whether a specific event does or does not belong 
to A. One should be interested in how large the expectation of all events is, if one 
limits oneself to elements of A. This implies that only events from A are included 
in the calculation of the expectation and that the relevant probabilities have to be 
normalized such that they sum to one. 

This concept can be illustrated particularly well with the help of a binomial 
model. For this purpose we use again Fig.2.5 from page 25 but add specific 
numerical values. 
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Fig. 5.6 Binomial model 140 
with cash flows CF to t = 3 120 
100 130 
100 
70 125 
60 


40 


Example 5.4 (Binomial Model) Figure 5.6 shows a binomial model with three 
points in time which describes future cash flows. Further, the upward and downward 
movements are equally probable. 

The didactic advantage of the binomial model is that questions of how individual 
events can be measured do not complicate the presentation of the relevant problem. 
In this example any set of events can be measured. Let us focus on the node of 125 
att = 3. 

There exist three possible states ending in this node. These states are the 
elementary events udd, ddu, and dud. If we want to determine the expectation 
of the cash flows at time t = 2 assuming that the payment 125 is reached in t = 3, 
this must be done as follows: 

Starting from the condition CF3 = 125, only the three events udd, ddu, and 
dud are possible. These events are equally likely; their—normalized—conditional 
probabilities are therefore 3. This results in the conditional expectation of 


1 1 1 
E[CF2|C Fs = 125] = 3: 100+ = - 100+ = - 60 ~ 86.67. 
ee 
udd dud ddu 


The following example can possibly make the approach clearer. 


Example 5.5 (Dice Roll) Consider the example of a dice roll, with event A being 
an even score, 1.€., 


A = {2,4, 6}. 


With an ideal dice every score can happen with the probability t. Restricting our 
consideration to the even scores three cases are possible. The conditional probability 
of each even score is 4. The expectation conditional on A is the sum of the products 
of the (even) scores with their conditional probabilities. Formally: 
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The conditional expectation of an odd score being rolled, that is for the event 
A = {1, 3,5} 
is 3. 


Using a so-called indicator function, '* the above results can be generalized. The 
expectation conditional on the measurable subset A can be expressed as 


[X -ladu(x) 


E[X|A] = oT ze (5.37) 


Equation (5.37) makes sense only for events with a positive probability. 


Conditional Expectation Regarding a o-Algebra Let us expand our analysis and 
investigate an expectation not only regarding a single event A but also regarding a 
whole o-algebra. Initially, we will explain what this means in mathematical terms. 

We have stated earlier that a o-algebra can be thought of as an information 
system.!> The elements of the o-algebra describe those events which a decision- 
maker can observe or verify. We know how to determine the expectation of a 
quantity X in relation to a single observable event A € F: it is the conditional 
expectation E[X|A]. If at t£ = 0 a decision-maker reflects on the future he cannot 
restrict himself to only one elementary event of set A. Indeed he will include the 
fact that also the complement of A can take place. Given the possible events of 
the o-algebra the decision-maker should obviously try to obtain an overview of 
all possible expectations of X. Thus, he calculates not only a single conditional 
expectation but also all conditional expectations for the conceivable sets A € F of 
the o-algebra. Let us illustrate this aspect in the context of a binomial model. 


Example 5.6 (Binomial Model) Referring to Fig. 5.6 on page 82 we focus on time 
t = 2. We have shown previously (page 43) how to describe the information which 
a decision-maker today believes he will have available at time t = 2. It’s about the 
o-algebra F2 which can be generated from the elements of set 


A = {{uuu, uud}, {udu, udd}, {duu, dud}, {ddu, ddd}}. 


!4This function is one on A and zero otherwise, so 


1, xeA, 
0, x ZA. 


la(x) := 


!5See page 42. 
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The set A contains only elementary events which can no longer be discriminated 
at time r = 2. Let us concentrate on all cash flows CF3 given in Fig. 5.6 for time 
t = 3 and try to determine their expectations on the basis of the information the 
decision-maker believes at £ = 0 he will have available at t = 2. For this purpose, 
we decompose the set A into the pairwise disjoint subsets!® 


A, = {uuu, uud} 
A> = {{udu, udd}, {duu, dud}} 
A3 = {ddu, ddd} 


with 
Aı U A2 U A3 = A. 


Since A is only the set that generates the o-algebra F2, A C Fh applies. It 
is easy to see that the expected cash flows CF3 depend on which of the three 
subsets is considered. Considering the subset Aı only the cash flows 140 and 
130 associated with the elementary events uuu and uud can materialize at time 
t = 3. Their expectation is } - 140 + } - 130 = 135. Correspondingly for 
subset Az only the cash flows 130 and 125 can occur and their expectation equals 
5 - 130+ 5 - 125 = 127.5. Similarly, for subset A3 the cashflows 125 and 40 matter 


and lead to 5 -125 + 5 - 40 = 82.5. Summarizing we have 


135.0, if A= Aj, 
E[CF3|A] = 1127.5, if A= Ao, 
82.5, if A= A3. 


Emphasizing the information that the decision-maker will have available at t = 2, 
the expectation can also be written in a somewhat more casual form 


135.0, ifatt=2 uu, 
E[CF3|F2] = 1127.5, ifatt=2 ud ordu, (5.38) 
82.5, fatr=2 dd. 


Note that on the right side of Eq. (5.38) only the three possible nodes at time t = 2 
are mentioned, while on the left side the o-algebra F2 is used which includes more 
information than only set A. The notation of this equation does not precisely match 
and is therefore a bit more casual. 


'6Given the subsets are not empty such a segmentation is called partition. 
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The above example deserves two comments: 


1. The conditional expectation is not just a number. !” Rather, there exist several val- 
ues because for each event A a state-dependent expectation must be calculated. 
While the classical expectation is written as E[X], the notation for the conditional 
expectation 


E[X|F] 


highlights this difference. 
2. While our example deals with only few events in generating the o-algebra, the 
idea can also be implemented with large algebras. 18 


!7In any case this usually applies. 


18 Since some of the relevant sets may have disappearing probabilities jz(-) = 0, the application of 
Eq. (5.37) is not permissible. 
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6.1 Preliminary Remark: The Space of All Paths 


Illustration of the Brownian Motion in Textbooks In many economic textbooks 
which address the Brownian motion one finds representations resembling those 
of Fig. 6.1. Let us initially focus on the blue path, a function frequently used to 
illustrate a typical path of a Brownian motion. Almost everyone would accept that 
the price of a share could develop as shown. In particular, economists find such a 
representation plausible. However, such an interpretation is more likely to mislead 
rather than to contribute to the understanding of what the Brownian motion is all 
about. Even worse, they convey a misconception of the Brownian motion. Let us 
explain this phenomenon by looking at a coin toss. 

With a coin toss we have always assumed that the only events possible are heads 
and tails. Of course certain other events are conceivable: one possibility is that a 
coin falls on its edge as shown in Fig. 6.2, or the coin could disintegrate in several 
pieces or it could entirely disappear. Yet all these possibilities are highly unlikely. 

Does Fig. 6.2 describe the random outcome of the coin toss adequately? Certainly 
not! In fact, the picture can be called misleading. The same argumentation also 
applies to the blue path shown in Fig. 6.1! This single path represents only one of an 
infinite number of possible paths. Rather, the Brownian motion must be considered 
as a dice with an infinite number of sides (instead six) where the outcome is a 
continuous function. Any function is called a path or an event; and every single 
path chosen is just as unlikely as the event shown in Fig. 6.2. 

Returning to Fig. 6.1 let us concentrate on the black path. With respect to stock 
prices everybody would consider the black path as an unlikely path because of its 
untypical (sinusoidale) shape: but the shape does not matter. Thus the blue path is 
as unlikely as the black path because both of them represent just one of an infinite 
number of continuous functions. 
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B, (e.g., price) 
A 


1 > timet 
1 2 3 


Fig. 6.1 Textbook description of a Brownian motion 


Fig. 6.2 Unlikely result of a coin toss (this East German coin was issued to commemorate the 
200th anniversary of Gauss’ birthday displaying the normal distribution) 


If Fig. 6.2 as a typical coin toss triggers discomfort, why doesn’t Fig. 6.1 cause 
similar discomfort when it comes to Brownian motion? 


From Single Numbers to Intervals Concentrate on a situation in which the 
random development of a company’s cash flows over time is of concern. Managers 
responsible for planning may assume that the cash flows in the coming year will be 
equally distributed in the interval between $3 million and $4 million. A decision- 
maker could be interested in the consequences of the cash flows being exactly 
$x million.! A reasonable person would dismiss such a discussion as a purely 
academic gimmick: with an evenly distributed random variable the probability of 
a specific real number is zero. This case does not have to be discussed any further 
since it is absolutely unlikely. It makes much more sense to select a relevant interval 
for the cash flows, for example, between $3.00 million and $3.25 million and 
to study their effect on important business parameters such as profit, investment 
volume, or firm value. Subsequently, the analysis of other cash flow intervals is 


I That would be $3,141,592 if we limit ourselves to full dollar amounts. 
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recommended. Only in this way one gets a useful understanding of the economic 
consequences resulting from the assumption of equally distributed cash flows 
between $3 million and $4 million. 


From a Single Path to All Paths After these reflections we return to Fig. 6.1 
showing two paths of a Brownian motion. As the probability of a payment of $ x 
million is zero, the same applies to the probability of each of the two paths. In 
fact, it is quite conceivable that the cash flows will follow not the sinusoidal but 
the blue movement that seems to be random. From the perspective of probability 
theory, both the sinusoidal and the blue movement are unlikely. In order to arrive 
at meaningful statements, we must focus on the entire event space of the Brownian 
motion. Looking at individual paths is meaningless. Rather, one must consider all 
paths. 


From Universal Statements to Probability Statements Unfortunately, by simply 
looking at all paths C[0, oo) one goes too far. Stating “A particular property xyz 
applies to all paths of the set C[0, 00)” could cause a problem. By doing so we 
would include paths that are not of interest because nobody would consider them as 
random (remember the sinusoidal path!). While these functions are annoying, they 
still do exist. A “trick” is required to at least ignore or disregard them. 

The crucial step in disregarding annoying functions is to switch from universal to 
probability statements.” Unfortunately, this switch demands a hard-to-read double 
negation. We have to state: “It is not likely that the property xyz does not apply to 
all paths.” This is the trick allowing us to ignore the annoying functions. The same 
result, however, can also be achieved by the following (positive) statement: a set of 
paths has the property xyz almost everywhere, if and only if the set’s probability 
equals 1. 

Then the only question remaining is how to construct the relevant probability. 
This probability is the so-called Wiener measure in the space of continuous 
functions. In the preceding chapters we have developed important foundations (o- 
algebra, definition of the measure) for defining this measure. Once the Wiener 
measure u is defined we can state the following: a Brownian motion “has property 
xyz, if and only if the set of functions C[0, oo) with this property has measure 1. 
Table 6.1 illustrates our remarks. 

Let us summarize. The definition of the Brownian motion we will present shortly 
may irritate readers with economic backgrounds. Why? 


1. Economists tend to look at Brownian motion by considering only a few paths, 
perhaps two, ten, or a hundred, instead of recognizing that this stochastic process 
consists of an infinite number of paths. This characteristic of the Brownian 
motion is easily overlooked as the aspect of infinity manifests itself only in the 
inconspicuous symbol C[0, oo). 


?We had seen in Definition 3.3 on page 55 how one can “get rid of” annoying properties of an 
object with the term almost everywhere. 
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Table 6.1 Analogy between 


: es . Ideal coin Brownian motion 
coin toss and Brownian i 
; State Coin face Path 
motion - - f 
Event space | Heads or tails | All continuous functions 
[0, œ) > R 
Probability 5 Wiener measure u 


2. In addition to considering only few paths, many economists attach importance to 
the scope of individual paths. The focus of the definition on page 93, however, 
will be placed on something quite different: the focus has to be the probability 
u which is assigned to the sets of paths. This is of crucial importance. Some 
economists may not even realize that such a probability exists. 


Two further remarks seem to be necessary. First, individual paths of the Brownian 
motion cannot be differentiated at almost any point. Second, almost all paths are 
non-monotonous in any interval as small as the interval may be. Economists not 
having a sufficient mathematical background may fail to appreciate the significance 
of those statements.* Without detailing these mathematical properties we can only 
state that the paths can by no means look as shown in Fig. 6.1. Accounting for these 
two remarks the typical paths or jagged functions frequently found in economic 
textbooks are anything but typical. 


6.2 Wiener Measure on the Space of Continuous Functions 


Equipped with the background presented in the previous chapters we are approach- 
ing the core of our book. The following material is based on the American 
mathematician Norbert Wiener who in the 1920s put the Brownian motion on solid 
mathematical grounds. 


Binomial Model and Space of Continuous Functions We had already illustrated 
a binomial model on page 25 in Fig.2.5. A path is a complete “one-way tour” 
through a tree from its origin to one of the ends. 

Contrary to the binomial model the paths in a Brownian motion are not based 
on sequential upward or downward movements of an economic quantity. Rather, a 
path is a continuous function. Furthermore, no additional assumptions are needed, in 
particular the functions do neither have to be differentiable nor monotonous. Each 
of these continuous functions describes how the relevant variable can develop in 
time. 


3We will give corresponding explanations on page 95. 
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Fig. 6.3 Two elementary events in the event space C[0, oo) from Fig. 3.5 


The Measure of a Set of Continuous Functions In order to determine the measure 
of a set of continuous functions we must first clarify which sets can be measured at 
all. This task was described in our Example 3.7 on pages 48 to page 50. We are 
following the pattern used to construct the o-algebra. The scheme consisted of a 
finite number of design steps.’ In the first step, an arbitrary time is selected and 
an interval [a, b] is chosen. Thus, a measurable set contains all functions passing 
through the interval [a, b] at time t. 

Let us look at of all these functions. For this purpose we redraw Fig. 3.5 from 
page 48 but eliminate the sinusoidal path as it does not go through the [a, b] 
interval at time t; therefore, it does not belong to the set to be measured (Fig. 6.3). 
Remember, that the set of all paths going through the [a, b] interval is called a 
cylinder set. 

We determine the measure of this cylinder set as follows”: 


b 
wif: ft) Ela, bl} = br (x) dx. (6.1) 


It is easy to see that the measure depends on both time 7 and the interval [a, b]. With 
increasing time ¢ the measure of the cylinder set decreases because the smaller the 
density, the larger the variance. And the larger the interval [a, b], the larger is the 
measure of this set. 

The largest possible value of a measure of any cylinder set is 1 implying that the 
set contains all continuous functions.° Since the density function is never negative 
the Wiener measure is a probability measure. 


4For details refer to pages 48 to 50. 


Here ,2(-) represents the density of the normal distribution with expectation 0 and variance o°. 


6This is the borderline case for a — —oo and b —> oo for any time t. 
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Usually one denotes the antiderivative of the density function ¢;(x), i.e., the 
distribution function, with the symbol ®;(x). Using the fundamental theorem of 
calculus we can write the Wiener measure in the following form: 


wf: fin) € la, bl) = d: (b) — P, (a). (6.2) 


Let us determine the measure if the length of interval [a, b] goes to zero. In the 
case of a = b, the measure is obviously zero. But for an infinitesimal small interval 
[x, x + dx] the difference of the distribution functions ®;(x + dx) — ®;(x) tends 
to d, (x) dx. The resulting measure is 


Mf: FO € [x,x + dx]}) = dia) dx. (6.3) 


We will use this equation on page 95. 

In the next step we define the Wiener measure of a cylinder set having not only 
one but two points in time. Remember the construction rules on page 49 (Fig. 3.6): 
the paths belonging to this cylinder set have the property that they traverse certain 
intervals. At the first (earlier) time ¢ the function value f(t) must lie in the interval 
[a, b]; at the second (later) time s the difference of the function values f(s) — f(t) 
must run through the interval [c, d]. The measure is defined as follows: 


wdf: fŒ €la,b] and f(s) — f(t) € [c, d]}) 
b d 
= f l :(x)bs—1(y — x) dx dy. (6.4) 


This definition requires further explanation. We see two integrals. First, we recog- 
nize the integral over x contained in the interval [a, b] which we already used above. 
Second, we can identify another integral over a variable y contained in the interval 
[c, d]. The difference y — x is normally distributed with variance s — t. 

The definition is easier to understand if one looks at small intervals [x, x + dx] 
and [y, y + dy] instead of the (arbitrarily large) intervals [a, b] and [c, d]. Using a 
notation similar to (6.3) with density functions leads to 


wf: fŒ €[x,x+dx] and f(s)— f(t) € Ly, y +dy]}) 
= hi (x) $s- (y) dx dy. (6.5) 


Equation (6.5) highlights that the product of two density functions must be 
determined in order to obtain the measure of an infinitesimal small range. The 
product of density functions comes into play with independent quantities. Thus, 
we can see that the function value f (t) at time t should be independent of its further 
development described by the difference f(s) — f(t). 

For each further point in time multiply expression (6.5) with the additional term 
Pta+1—t, C). Again the variance of this normal distribution depends on its distance to 
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the earlier points in time. For example, the measure for the cylinder set considering 
a third time r > s results in 


wf: f@ e [x, x +dx] and f(s) - f(t) € Ly, y+dy] 
and f(r) — f(s) € [z, z+ dz]}) = G(x) bs-1(y) br—s(z) dx dydz. (6.6) 


6.3 Two Definitions of the Brownian Motion 
At last, we can define the Brownian motion formally. 


Definition 6.1 (Brownian Motion, Mathematically) A Brownian motion is given 
by a probability space (C[0, 0), o-Algebra, u) with u as Wiener measure.’ 


This definition is extraordinarily terse and cannot be beaten in brevity. The 
definition uses terms which are not easily understood by non-mathematicians. 
However, we hope that careful reading of the previous chapters of this book will 
help to overcome any obstacles. 

Economists usually define the Brownian motion quite differently.® 


Definition 6.2 (Brownian Motion, Economically) The Brownian motion W (t) 
meets the following three properties: 


1. Starting at the origin: W (0) = 0 applies. 

2. Continuity: W (t) is continuous in t. 

3. Distribution: the increments W (s)— W(t) for s > t are normally distributed with 
expectation zero and variance s — t. For adjacent time intervals the increments 
are independent of each other. 


One might be inclined to think that both definitions express very different objects. 
However, that is only seemingly so. In fact, both definitions are equivalent! 


Equivalence of Both Definitions It is possible to prove that Definition 6.2 can be 
derived from Definition 6.1. To realize this, we first need to understand the meaning 
of W(t). 

W(t) is a random variable. More precisely, the random variable W(t) returns 
a real number for every event. This real number depends not only on time r but 
also on the event. In Chap. 4 (on page 59) we have stated that these events are the 
cause for the observed result. What are causes and events in this definition? Further, 


7The associated ø -algebra was discussed on page 49. For the Wiener measure we refer to pages 91 
to 93. 


8See for example Hassler (2007, p. 117). 


94 6 Wiener’s Construction of the Brownian Motion 


how can one imagine the functional dependency between individual events and the 
corresponding real numbers? 

In Chap. 3 (on page 26) we had identified the set of all future events as space 
Q = C[0, oo). Each continuous function f defined on the interval [0, oo) represents 
a possible event, in fact an elementary event. This event, i.e., this function f, 
determines the observed real number. While in the frequently used dice example 
only six possible elementary events are possible, in the Brownian motion an infinite 
number of elementary events do exist. The number of conceivable continuous 
functions f on the interval [0, oo) cannot be counted. Any continuous function f(-) 
that can be drawn from the “C[0, oo)-lottery” represents a random event; and each 
of these functions returns a certain function value f (t) at time t. 

Thus, the random variable W (t) can be described as follows. W(t) is the random 
variable that assigns to an elementary event f, i.e., a continuous function, the value 
that this function f assumes at time f. In order to describe W(t) formally one can 
state 


W(t) (f € CI, o0)) := fi) (6.7) 


random variable (event f) := numerical value. 


In interpreting (6.7) one has to be careful because f appears on both the left and the 
right, however, with two different meanings. On the left we see the notation of the 
random variable W(t). The value of this random variable depends (causally) on a 
random event f and such an event must be a function from the event space C[0, oo). 
On the right f(t) is the value the randomly selected function f assumes at time f. 

The function f performs two tasks in Eq. (6.7). Appearing on the left side of (6.7) 
it is the cause of uncertainty; it triggers the fluctuations of the Brownian motion. On 
the right side the function f describes the fluctuation in detail by specifying for each 
t how large the fluctuation will be. This is accomplished by the term f(t). 

To this point we have dealt with W(t). We must now focus on the difference 
W(s) — W(t) appearing in the economic definition presented above. This difference 
can be interpreted as a random variable and must therefore depend on a random 
event. This event must again be a continuous function f € C[0, 00). Analog to 
the above considerations the result of the random variable is not the value of this 
function at time ¢ but its change between the times ¢ and s 


(W(s) — WM) (£ € CLO, 00)) = f(s) - fO (6.8) 


random variable (event) := numerical value. 


At last we can dedicate ourselves to prove that the mathematical Definition 6.1 
actually fulfills the Properties 1 to 3 in the economic Definition 6.2. 


Proof We will show that the three properties are true. 
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1. Start at the origin: The random variable W (0) always corresponds to the function 
value at £ = 0 for every conceivable event. Since all functions from the set 
C[0, ©) start by definition in zero,” property | is satisfied. 

2. Continuity: Continuity requires that each realization of the random variable (i.e., 
each event) represents a continuous function in f. Since the space Q of the Wiener 
measure consists only of functions being continuous in f, property 2 is also 
satisfied. 

3. Distribution: To prove that the increments of Brownian motion are independent 
and normally distributed, we can determine the distribution function. For this 
purpose we calculate the probability 


Prob { f: (WO - WoO) sal =n({F: (WO)-—WO)( <a). 
(6.9) 


We show that this expression corresponds to the normal distribution ®,_;. 

Concentrating on Eq. (6.5) which defines the Wiener measure u, we immediately 

see the following two properties: 

(a) The two density functions are multiplied by each other. This is to say that the 
random variables W(t) and W(s) — W(t) must be independent of each other. 

(b) Both random variables are normally distributed with densities ø; and ¢s_;. 

This was to be shown. | 


6.4 Often Neglected Properties of the Brownian Motion 


The following characteristics of the Brownian motion are often neglected in 
economic textbooks. Our purpose for discussing these properties is not for the sake 
of completeness. Rather, the understanding of further properties will enhance the 
skepticism in the use of Brownian motion when modeling economic processes. 


Non-differentiability One can prove that the paths of Brownian motion cannot be 
differentiated u-almost everywhere.!° 

Figure 2.6 (page 27) illustrated that the sine function as well as the linear function 
represent conceivable paths of Brownian motion. Such functions are known to be 
differentiable. However, the probability that such paths will occur is zero. All of 
them are extremely unlikely. 

While we do not intend to prove non-differentiability we at least want to make it 
plausible. For this purpose we concentrate on an arbitrary path W (t) of the Brownian 
motion. Assuming that this path is differentiable implies that its derivative W’(t) 
exists. 


9See page 27. 
10The phrase “almost everywhere” is described in Sect. 3.8. 
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For differentiable functions the mean value theorem of differential calculus 
applies. This proposition says: if a function f is differentiable on the closed interval 
[a, b] (with a < b) and continuous on the open interval (a, b), there exists at least 
ones € (a, b) with 


Fb) = fa) 


fo = (6.10) 
b-a 

For a random number ¢ with an expected value of E[e] = 0, it follows that there is 
as € (t, t +€) such that the difference between W(t + €) — W(t) can be estimated 


by 
W(t +8) - W(t) = W'(s): e. (6.11) 


This is the key to understanding the assertion that the paths of the Brownian 
motion cannot be differentiated. Equation (6.11) is an identity of two random 
variables implying that we can form variances on both sides. It follows from the 
properties of the Brownian motion that the left side of this equation is a normally 
distributed random variable with expectation zero and variance t + £ — t = £. Thus 


é = Var[W(t +e) — W(t)]. (6.12) 
Further, the mean value theorem (6.11) tells us what happens on the right side, 
Var [W’(s) - £] = 2° Var[W'(s)]. (6.13) 


Combining the two equations results in 
1 / 
— = Var[W'(s)]. (6.14) 
E 


Letting £ — 0 the left side of this equation approaches infinity. The right side of the 
equation goes to W’(t) and we have the logical contradiction W’(t) = oo. 

Since the first derivative does not exist, the paths of a Brownian motion are (w- 
almost everywhere) not differentiable. Non-mathematical readers will most likely 
have trouble imagining such functions. We will hardly be able to change that. 


Infinite Zero Crossings at the Beginning Many economist do not seem to care 
whether a path of a Brownian motion is differentiable or not. Further, we will discuss 
a property which is even more outrageous. We are talking about the intersections of 
any path with the abscissa: how often does such a path cross the abscissa in a certain 
time interval? 

Again, we must realize that this is not about the behavior of a single path. Mind 
one should remember that the set of all paths of the Brownian motion must be 
considered. Thus, the only meaningful question is: what is the probability that the 
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Fig. 6.4 Curve of the arccos arccos(‘o/t;) 
function in the interval [0, 1] ù 
n/2 
0 > toft 
0 1 


paths of the Brownian motion cross the abscissa in an interval of two consecutive 
points in time (fo, t1)? With reference to the Wiener measure discussed in Sect. 6.2 
we have to determine the measure! ! 


w({f : ren) f(t) =0)). (6.15) 


In words: how large is the Wiener measure for all functions of the Brownian motion 
taking the value zero at least once in the interval (fo, t1)? We will only provide an 
answer to this question without proving it.!? Interestingly, the probability depends 
only on the quotient r The following applies 


2 to 
alt? : Jte (t,t) fO= 0}) = oo (2) i (6.16) 
The arccos function (arc cosine function) is most likely not well known to non- 
mathematicians. For this reason its shape is illustrated in Fig. 6.4. 

What characteristics does the arccos function have if both fo and tı tend to 
zero? Although the quotient 2 can take on any value, one can easily think of 
examples with the quotient being very small. Consider the following intervals: 
(to, t1) = (/n?, 1/n) with n being 2,3,4,... With increasing n these intervals are 
getting closer and closer to zero. Since ‘0/1, = !/n — 0, the probability of a zero- 
crossing of the Brownian motion increases and finally approaches 1. Thus, every 
single Brownian path has an infinite number of zeros in the neighborhood of its 
origin. Such behavior of a function is very bizarre. 


Boundlessness We will now consider another interesting characteristic of the 
Brownian motion which is also frequently omitted in economic textbooks. Plots 
of Brownian paths displayed in such publications usually resemble Fig. 6.5. 

Let us concentrate on a very small interval on the time axis [t, t + £] and ask the 
following question: what is the probability that the paths of a Brownian motion are 
bounded in this interval? In addressing this question we will initially focus on the 
upper bound. How large is the probability that all Brownian paths will not exceed an 


11The symbol 37 means that “there exists a t where ... applies.” 
12 A proof can be found in Klebaner (1998, p. 76). 
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Fig. 6.5 Typical textbook image of a Brownian path 


upper bound in the very small interval? More precisely: what probability measure 
is to be assigned to the set of all paths that will not exceed an arbitrarily high upper 
bound X in the time interval [t, +8]? In the following we will show that the answer 
to this question is zero. If one tries to explain this result intuitively one could state: 
“In every small interval practically all paths are unbounded.” 

This assertion cannot be derived from Fig. 6.5 since the path is clearly restricted 
everywhere. In order to resolve this (apparent) contradiction we have to realize once 
again what a Brownian motion is: it is not a single path as shown in Fig. 6.5 with 
limited fluctuations. Rather, we are dealing with an infinite number of paths that 
fulfill a given property (however defined) with a certain probability. Concerning 
the property of unboundedness our assertion can now be stated more precisely: 
if we consider in the interval [t, t + £] the set of all paths having upper bounds 
simultaneously,'? this set has measure zero. Paths with upper bounds are therefore 
unlikely. They hardly ever happen. 

Since proving our assertion is exceedingly involved, we will refrain from 
presenting a formal proof; rather we will try to substantiate the assertion by using 
appealing arguments that can be found in the literature on Brownian motions. Let 
us first remember that the Brownian motion is the set of all continuous functions 
C[0, co). Continuous functions have a maximum in closed intervals; however, the 
issue of how this maximum is distributed remains open. A clear answer can be found 
in the theory of Brownian motion. For this purpose we define K as the upper bound 
and consider the set of all paths f(s) that remain below this limit in the interval 
[t, t + £]. This set is described as 


Ir: max fy sk]. (6.17) 


t<s<t+e 


130f course the path shown in Fig. 6.5 belongs to this set. 
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Determining the number of paths contained in this set translates to the mathe- 
matical problem of deriving the measure ju of this set which is given by!* 


/ 2 K 2 
u (Ir : max f(s)< «}) = | e = dx. (6.18) 
t<s<t+e TE Jo 


It is helpful to remember that for K — oo the following holds: 


co x2 TE 
f e %dx=,/—. (6.19) 
0 2 


For K < oo, however, it follows immediately 


2 JE 
=| eE dx <1. (6.20) 
TE Jo 


Expression (6.20) implies: regardless how large K is, we will never arrive at a 
situation where all paths of a Brownian motion fall below K with probability 1. 
Incidentally, this result applies to every small time interval [r, t + e] since all our 
statements are independent of the actual value of £. Thus, it can be stated that the 
Brownian motion is unbounded even in the smallest interval. 

To this point we only dealt with the upper bounds of Brownian paths. Using the 
same logic, one can also show that no lower bound exists for an infinite number of 
paths. 

The above possibly confusing result is due to the fact that we have insisted on 
including all Brownian paths. However, if we consider also the probabilities of 
the Brownian paths we obtain results which are far less irritating. To this end we 
set a limit of K for the upper bound and —K for the lower bound. Based on our 
previous considerations we know that not all paths can meet finite bounds in any 
small interval. We are interested in finding upper and lower bounds such that only 
x% of all paths will fall within these bounds, however small the interval. 

Figure 6.6 shows different funnels each labeled with respective probability 
levels. Determining these funnels or bounds for each level of probability can be 
accomplished by applying sophisticated mathematics; however, we restrain from 
presenting the details. !5 

A probability of x% indicates that the paths which run within the bounds 
constitute x% of all paths of a Brownian motion. The funnels widen with increasing 
probability. Of course for x = 100 % the funnel is boundless. 


!4The distribution of the maximum can be found for example in Karatzas and Shreve (1991, p. 96). 
'5See Karatzas and Shreve (1991, p. 96). 
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Fig. 6.6 The figure shows 


for several probabilities x% K for 90 % 
those barriers within which a 
total of x% of all paths of a K for 75 % 


Brownian motion run 
K for 50 % 


a 


K for 50 % 
K for 75 % 


K for 90 % 


Non-monotonicity In economic textbooks Brownian paths are usually constructed 
by the approximation!® 


AW =eVAt. (6.21) 


The AW represents a change of a Brownian path that takes place after a (short but 
finite) time period of Ar and with e being a standard normally distributed random 
number.’ Returning to Fig.6.5 we see an example of a corresponding path at 
fixed points in time At, 2Ar, 3At... The path is created by linearly connecting 
the values approximated by Eq. (6.21). Such a piecewise linear function is of course 
monotonously increasing or decreasing in any time interval Ar. 

However, the property of monotonicity is lost once we are abandoning the 
approximation approach and return to the Brownian motion. Instead of analyzing a 
single path one must consider the set of infinitely many paths having the property of 
being monotonously rising or falling in any infinitely small time interval, i.e., At > 
0. It can be shown that the measure of this set is zero.!* Monotonously growing or 
decreasing paths in arbitrary time intervals are therefore entirely unlikely, even if a 
single path may have monotonous sections. 


16See for example Hull (2015, p. 304 ff). 
'7See also Eq. (1.4) on page 5. 
'8See for example Klebaner (2005, p. 64). 
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Check for 
updates | 


Anyone writing a book will rarely follow a plan that was not revised several times 
during the process. This was definitely the case when this book was written. We have 
discussed many different versions before we arrived at the current format. In some of 
these versions mathematical terms like “convergence of functions” or “cardinality of 
sets” played an important role. At the end, we found a way to discuss the Brownian 
motion without using these terms explicitly. The obvious consequence could have 
been to simply drop this material. 

Discussions with students and colleagues taught us that these topics can also 
be of use in several other areas of economics. Therefore we decided to leave the 
supplements in our book. The four subsequent sections can be read independently of 
each other. The entire chapter can be skipped for the understanding of the Brownian 
motion. 


7.1 Cardinality of Sets 
Imagine adding 0 to the set of scores on a dice as another element: 
{0, 1, 2, 3,4, 5, 6}. 


Obviously, this set is larger than the original set: instead of six there exist now seven 
elements. With this simple fact in mind, one is inclined to conclude that this idea 
will also be applicable in the case of infinite sets. For example, if we compare the set 
N of all natural numbers with the set Z of integers, it seems reasonable to suppose 
that Z is greater than N. 

However, one cannot prove whether such a proposition is correct or false by 
looking at the number of elements. This number is infinite in Z as well as N, and 
we had already realized that infinite is not a number that can be used to perform 
simple arithmetic operations such as addition or comparisons. Thus, one has to 
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create another concept if one wants to compare infinite sets. This boils down to 
cardinality. 

If one looks at infinite sets, results dealing with finite sets seem to contradict 
common sense. First, one might think that the set of natural numbers is smaller than 
the set of integers since all negative values —1, —2,... are missing. However, one 
can prove by a simple consideration that this conclusion is mistaken. Rather, it is 
shown that the set of integers is exactly as large as the set of natural numbers or 
both have “the same cardinality” which we will explain below. This underlines the 
fact that infinity must be handled very carefully. It is better not to rely on common 
sense or “intuition”! 

The idea of cardinality is to employ a one-to-one relation when comparing 
two sets rather than counting their elements. Two sets are said to have the same 
cardinality (or are “equal in size”) only if there exists a one-to-one relation between 
all their elements. 

With finite sets counting elements or using one-to-one relations lead to the same 
result. Figure 7.1 illustrates that the set with seven elements is greater than the set 
with six elements: one element from the set {0, 1,..., 6} will never find a “partner.” 

In the case of the two infinite sets, however, the outcome is surprising. This 
is demonstrated by the assignment in Fig. 7.2: each natural number is mapped to 
exactly one integer and this mapping is one-to-one. One can clearly observe that 
both every natural number and every integer appear exactly once. Those preferring 
formulas might use 


—Ż, ifnis even; 


f:N>Z, ie 
"I, ifnisodd. 


(7.1) 
f is a function that obviously assigns an integer to each natural number n and f is 
also reversible in the sense that every integer in Z is also captured. 

The idea of cardinality will be further illustrated with another example. 


Example 7.1 (Cantor’s Diagonal Argument) The set of nonnegative rational num- 
bers Q+ has the same cardinality as the set of natural numbers. To show the 
equivalence it is necessary to prove—analogous to Fig. 7.2—that it is possible to 
uniquely assign all nonnegative rational numbers to natural numbers. 


Fig. 7.1 The two finite sets 0 
{0,1,...,6} and {1,2,...,6} | 
9 


——- 


do not have equal cardinality 


j= 


Fig. 7.2 The infinite sets N 0 3 
and Z have equal cardinality l l | | 
0 2 
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Fig. 7.3 Cantor’s diagonal argument to prove that N and Q, have equal cardinality 


The rational numbers Q+ consist of all fractions = with m and n being positive 
natural numbers. These rational numbers are now arranged in an infinite two- 
dimensional matrix as shown in Fig.7.3.! The arrows shown illustrate how one 
may imagine the one-to-one correspondence between the natural and the rational 
numbers: the | is assigned to fraction 1, the 2 to fraction i, the 3 to fraction 5, the 


4 to fraction L, the 5 to fraction 5, and so on. 
This procedure would create a one-to-one relation if there was not an annoying 


blemish. The right matrix contains too many elements. The rational numbers b 5, 


3, ... OF È. Í, = ... are actually identical and do not represent different rational 


numbers at all. Therefore, they must not be assigned to different natural numbers. 
One has to make sure that they are accounted for only once. This is achieved by 
“thinning-out” the right matrix. All fractions # consisting of m, n which are not 
coprime are deleted. In this case the diagonal construction is only carried out for 
values that are coprime. The formal proof is much more complicated due to this 
“thinning-out” and must—if one wants to be formally precise—be conducted with 


complete induction. However, we will not present the details of this proof. 


A set whose cardinality corresponds to the cardinality of the natural numbers is 
called countable. In this sense natural numbers, integers and rational numbers are 
countable. Countable quantities are of great importance because they can appear as 
indices in sums and products. An expression of the form }`;< 4 a; makes sense if 
and only if A is countable. If A = N one can even write lim, ae a; for this 
sum. 

One could suspect that for all infinite sets it can be proven— with ingenious 
tricks—that they are countable. However, that is not the case and we will show 
for a very prominent set that it is larger than the set of natural numbers. 


'The idea of this proof goes back to the founder of set theory, Georg Ferdinand Ludwig Philipp 
Cantor (1845-1918, German mathematician). 
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Fig. 7.4 Cantor’s diagonal 0.aı a2 a3 aq... 
argument to prove the 0. bi bz b3 ba... 
uncountability of real 0.cı C2 C3 C4... 
numbers 0 
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Example 7.2 (Uncountability) We prove that the set of real numbers R has a 
different cardinality than the set of natural numbers. That is quite simple. 

To this end we assume that someone claims being able to map the set of real 
numbers one-to-one to the set of natural numbers. This person would be able to list 
all real numbers one after the other. This would constitute a sequence of all real 
numbers. In particular, this person can name a unique predecessor and successor for 
each real number. We will show that at least one real number is still missing—which 
is a contradiction. This proves that the set of real numbers must be larger than the 
set of natural numbers. 

In Fig.7.4 we present the sequence of real numbers with their (possibly 
infinite) decimal representation which the above person claims to be complete, i.e., 
containing all real numbers. Instead of the decimals 0, 1,..., 9 we use symbols 
aj, bi, ci, dj, ... for every real number.? 

The missing number can be constructed very easily. We consider Fig. 7.4 as a 
matrix of numbers and focus on the diagonal (the diagonal elements are printed in 
red). Using the diagonal we form a new real number of the form 0. z1 Z2 Z3 z4.... AS 
first decimal zı of this new real number, a decimal must be selected such that it does 
not equal aı. The second decimal must fulfill the inequality z2 # b2, for the third 
decimal the inequality z3 # c3 must hold, and so on. The new real number formed 
in this way cannot match any of the numbers mentioned in our person’s supposedly 
complete list. With each element of our person’s list (at least) one decimal in the 
representation is different from our newly constructed number. We have found the 
missing number! 


These considerations show that the set of real numbers can hardly be counted. It 
is said that the real numbers are uncountable. Therefore, it follows that an expression 
of the form %;cr ai does not represent a mathematically meaningful term: each 
element i in an index set must have a unique predecessor and a unique successor, a 
situation impossible for the real numbers R. 

Example 7.2 shows that there exist infinite sets with different cardinalities. The 
set of real numbers R is “larger” than N, while the sets of natural numbers is “as 
large” as the sets Z and Q4. In mathematics this is indicated by appropriate symbols. 
The number of natural numbers is not indicated by the rather fuzzy infinity sign © 


Without loss of generality we can ignore all digits before the decimal point. 
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but by the symbol &o.° Since the cardinality of the real numbers is greater than No, 
the symbol N is used. 


Concluding Remark Finally, we would like to draw the reader’s attention to an 
interesting issue. We have already shown that the set of natural numbers is smaller 
than the set of real numbers. Instead of the set of natural numbers, one could use 
their power set P(N), i.e., the set of all subsets of natural numbers. This power set 
contains the set of all even numbers, the set of all odd numbers, the set of all natural 
numbers less than 5, and so on. Without presenting the mathematical details, it can 
be shown that the power set has the same cardinality as the set of real numbers. On 
page 20 we had made it clear that for a finite set of n elements the number of subsets 
is just 2”. This relationship is assigned to the symbols just introduced by writing the 
following equation: 


230 = N]. (7.2) 


However, this symbolic notation should not be confused with real arithmetic 
operations. One must not write No = log, (81). 

What do these considerations tell us? If mathematicians transfer as in (7.2) a 
symbolic notation from one subject area to another, one is tempted to use it in 
all its dimensions. Unfortunately, such practice cannot only be wrong but even be 
dangerous. We have already experienced this situation while discussing the notation 
of Brownian motion. 


7.2 Continuous and Almost Nowhere Differentiable Functions 


In order to discuss the Brownian motion thoroughly, it is useful to deal with 
remarkable features of functions. The paths of Brownian motion are continuous 
functions which one cannot differentiate at (almost) any point. Anyone wanting 
to handle such functions properly must recognize that the use of mathematical 
operations known from ordinary analysis is inadmissible. Compared to ordinary 
analysis dealing with Brownian paths can be considered as being “exotic.” 
Non-mathematicians probably cannot imagine continuous functions that are 
not differentiable (almost) anywhere. We would like to assist this understanding 
by an example developed by Weierstraß.* He also showed that in mathematics 
such functions are anything but rare. Prior to Weierstraß these functions had been 


3The symbol N is the first letter of the Hebrew alphabet and is pronounced aleph. 

*Karl Theodor Wilhelm Weierstraß (1815-1897, German mathematician). In 1872 Weierstraß 
introduced this function in a lecture and claimed that Riemann had knowledge of such an example. 
However, no such reference has been found in Riemann’s inheritance. Around 1830 Bolzano found 
the first example of a function that could not be differentiated almost anywhere in a manuscript that 
was published only in 1922. 
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Fig. 7.5 Approximation of the Weierstraß function w(x) using the first seven summands 


regarded as “monster curves.” It was assumed that these functions were either only 
special cases or that the points where differentiation is not possible were indeed rare. 


Weierstraß considered the function 


[0,0] 


w(x) = >» mro (7.3) 


2n 
n=0 


To give an idea of the appearance of this function, Fig. 7.5 shows only the first 
seven summands of a Taylor series.° We concentrate on two characteristics of the 
Weierstraß function: first its continuity and second its differentiability. 

Non-mathematicians state that a function is continuous if one can draw its path 
without interrupting the movement of the drawing pen. Although this is not a precise 
definition one may suspect that the Weierstraß function is continuous when looking 
at Fig. 7.5. Even with more precision the same result applies: the numerator of each 
fraction is at most | and the denominator grows exponentially. Therefore, the sum 
converges for each x. Furthermore, it also converges uniformly. This means that 
the difference between X” o sing and w(x) going to zero can be estimated 
independently of x. In such cases the property of continuity of the summands 
inc) also applies to the function w(x). 

The above considerations do not represent a complete proof but only give 
an indication of the evidence: the result is intuitively appealing. Looking at the 
definition of the function w(x) the following observation is decisive. The numerator 


5The French mathematician Charles Hermite (1822-1901) wrote in 1893 in a letter to Stieltjes: “I 
avert myself with horror and shock from this lamentable plague of functions that have no derivative 
at all.” 

6The picture does not change very much if additional summands are added with the approximation 
error being reduced. 
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Fig. 7.6 Cosine functions cos(3!x), cos(32x), and cos(3’x) 


of each additional summand exists in the interval [—1, 1]. On the other hand, 
the denominator of each new summand grows exponentially. Hence, each new 
summand (however it may behave) contributes only marginally to the change of 
the function value. Therefore, continuity is maintained at the limit. 

Let us turn to the second characteristic of the function w(x). Weierstraß was able 
to show that the function cannot be differentiated except for a few values x. While 
the proof is difficult, one can illustrate the result as follows: deriving the sum with 
respect to x one obtains’ 


OO) sii » (5) cos(3” x). (7.4) 


dx No 
n= 


n 
To examine this limit in more detail we first ignore the factor (3) and draw several 


graphs of the function cos(3”x) depending on n (see Fig. 7.6). 
It can easily be seen that the frequency of the cosine function increases with every 


n 
exponent n. Since the increasing fluctuations are multiplied by the factor (3) , their 


impact on the sum grows with n. Obviously, the sum can only converge for numbers 
x where the cosine function approaches zero. The zeros of these cosine functions are 
very thinly scattered.® For all other x the sum diverges to plus or minus infinity and 
this represents the default case. Thus, the first derivative of this function is almost 
everywhere either minus or plus infinity. This implies that the function cannot be 
differentiated anywhere. 


7We will change derivation and infinite summation in our calculation which is mathematically 
inadmissible under these circumstances. The following argument therefore does not constitute full 
proof. 


8The set of those x has Lebesgue measure zero. 
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7.3 Convergence Terms 


From numerous discussions with students and colleagues we learned that there is 
certainly interest in looking more closely at the issue of convergence of functions. 
When looking at convergence of numbers it is entirely irrelevant how to define 
convergence precisely. Regardless of the definition of convergence of numbers, 
all turn out to be equivalent. However, this is entirely different when dealing with 
sequences of functions. There are many different ways to define convergence with 
each option being fundamentally different from one another. While most non- 
mathematicians can imagine what a sequence of numbers is, the issue of dealing 
with a sequence of functions is very different. 

To illustrate this phenomenon we use an analogy. Finding the shortest route 
from Berlin to San Francisco depends on the way the earth is looked at. Using 
a conventional map of the world it will be concluded that the shortest route of 
the two cities is always south of 53° North. However, when using a globe you 
will find that the shortest route is in fact via Greenland. This analogy is similar 
to the convergence concept for functions: there are not just one but several ways 
of defining the convergence of a sequence of functions. The results depend on the 
chosen convergence definition. 

Convergence is important in the context of limits. To understand the applications, 
it is useful to realize how proofs are conducted in the theory of Lebesgue 
integration’: if one wants to prove that a certain property or a given proposition 
applies in general, one can make life easier to start by proving the correctness of 
the proposition for linear or piecewise linear functions. In order to show the general 
validity, one has to move from these simple functions to more general ones. To this 
end one has to consider the limit of a sequence of functions. A proposition applying 
to each (piecewise linear or simple) element of a function sequence will also apply 
to the limit of this sequence and thus to a general function. It should be noted it must 
not matter whether one integrates first and subsequently passes to the limit or vice 
versa. Integration and limit must be interchangeable: 


lim f 1f lim. (1.5) 
n Q Q n 


Let us look at random variables as an example of functions. For random variables 
expectation and variance are (Lebesgue) integrals. 10 From (7.5) it should follow 


lim EIZ,] = E| lim zu | (7.6) 
nao n—> oo 


’See page 71 ff. 
10See page 80. 
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and 


lim Var [Zn] + Var | lim Zn] (7.7) 
n—> Oo no 
Remember that Z,, is a random variable and thus a measurable function. 

The above claims deserve two remarks: first, there is an exclamation mark above 
the equal signs. We need a definition of a limit such that right and left sides are 
identical. It is apparent that limit and expectation or limit and variance can be 
swapped. Second, consider the left side of Eq.(7.5) which represents limits of 
sequences of numbers since expected values and variances are numbers. The right 
side of Eq. (7.5) does not contain a sequence of numbers but a sequence of functions. 
While students of economics are aware of how to determine a limit of a sequence 
of numbers, they may not know what a sequence of a function is let alone how to 
determine its limit. 

Before introducing two important concepts of convergence, namely pointwise 
convergence and mean square convergence,!! we will start with sequences of 
numbers. 


Sequences of Numbers In mathematical analysis, it is stated that a sequence of 
numbers converges to a limit if the numbers with a sufficiently large index will 
approach a particular value. For example, if you look at the sequence of numbers 


1 
Sn =at+-— withn = 1,2,..., (7.8) 
n 
we have 
1 1 
sı=a+tl, 2=at,, 5 =4+ 7» (7.9) 


and so on. By letting n increase the second summand decreases and approaches 
zero.!? For n — oo the summand can be neglected. Thus, the sequence converges 
to a which is written as 


1 
lim sn = lim (« + ~) =a. (7.10) 
n 


nC nC 


After exploring sequences of numbers we will now concentrate on sequences of 
functions. 


!! In addition to these two types of convergence, there exist in mathematics a few others definitions 
that will not be discussed here. 

12 One easily realizes that, for example, the sequence sn = (— 1)” does not converge with increasing 
n. Such sequences are called divergent. 
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n=l 
n=2 
n=3 
n — oo 
Fig. 7.7 What is the limit of a function sequence? 
Sequences of Functions We look at the simple example 
t 
frOQ=at—. (7.11) 
With increasing n one obtains 
t t 
fA@=att, AO=at>5, eae (7.12) 


and so on. It seems clear that such a sequence of functions converges and how its 
limit is determined. In a sequence of numbers individual numerical values at the 
limit should converge to a certain value. With a sequence of functions it is quite 
plausible to expect that with increasing n a function “clings to a limit function.” In 
the above example the functions f„(f) are approaching the limit function f(t) = a. 
Figure 7.7 illustrates this vividly. With increasing n the influence of the term 
I gets less and less significant in Eq.(7.11). The limit function takes the form 
limp+oo fn(t) = a. 


Pointwise Convergence This definition can be regarded as a “natural” candidate 
based on the above example. 


Definition 7.1 (Pointwise Convergence) Consider a sequence of functions of the 
form fn :Q > R. 

A sequence of functions f, converges pointwise! to a function f if and only if 
the following is valid!*: 


‚im frlo) = flo) Yor a. (7.13) 


13The noun is “pointwise convergence.” and the verb is “to converge pointwise.” 

'4The definition is easy to interpret: it is required here that for each value w the sequence fy (w) 
converges against the number f(w). So you concentrate on each value f (œw) and ignore the values 
fo + ô) “next to it” when considering convergence. 
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n—-oo 


Fig. 7.8 An example with regard to the pointwise convergence 


With this definition of convergence integration and limit can be swapped only 
under certain conditions. 1° 

We will now present an example which demonstrates that the interchangeability 
of integration and limit is lost if one uses pointwise convergence. The expected value 
of the limit does not equal the limit of expectations. 

Let us consider the state space Q = R and a function fn which is zero on the 
real line except in the neighborhood of n € R. The area below the function should 
be exactly one. Figure 7.8 illustrates such a function that show a rectangle at index 
n. With increasing index the rectangle is moving to infinity. 16 

We look at this sequence of functions and apply the definition of pointwise 
convergence. Doing so we will show that the limit of this sequence is zero with 
the rectangle neither changing its form nor disappearing entirely. This might be 
surprising. 


e The functions f, converge pointwise to zero: consider a fixed value t. For t the 
following applies 


jim, fat) = 0, (7.14) 


15 Sufficient conditions are formulated in the theorem of monotone convergence. The theorem is 
due to Beppo Levi and can be found in any textbook on measure theory, for example, Rudin (1976), 
theorem 11.28. 
16Ror example, consider f3(f) and fı(t). Att = 1 we have f3(t) = O and fı(t) = 1 and thus 
BOZ AO. 
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because any index n will eventually be greater than t. This is why the following 
must hold: 


oo 
im MD) =0 => J lim fa(t)dt = Q. (7.15) 
n>Xx -o NO 


e On the other hand, the area under each function is 1 and therefore 


00 n 1 1 
J Int) dt = fat)dt=n+- n—-]ļ]=l, (7.16) 
z ii 2 2 
and therefore 
00 
lim fa(t)dt = lim 1=1. (7.17) 
N>OO J% n— 00 


Equations (7.15) and (7.17) show that one must not interchange integration and limit 
in the sequence of functions considered here. This conclusion can be expressed as 


lim # fim. (7.18) 


For the reasons described above such a result is useless. We must therefore note 

that pointwise convergence is not an appropriate concept. Rather, it is advisable 
to find another concept of convergence which permits the interchangeability of 
integration and limit. 
Mean Square Convergence This concept of convergence!’ is used to ensure that 
expectation (i.e., expected value and variance) and limit can be interchanged. To 
this end we assume a measure space (Q,F, u). It is presupposed that there is a 
sequence of measurable functions fn. 

Mean square convergence measures the difference of a function (out of the 
sequence) and its limit. Mean square convergence is defined that the sequence 
converges if both the expectation and variance of this difference go to zero. The 
formal definition reads as follows. 


Definition 7.2 (Mean Square Convergence) A sequence of measurable functions 
Jn converges in mean square to a function f 


lim fr =f, (7.19) 


n> oo 


'7Tn the literature mean square convergence is also labeled as L?-convergence. 
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if and only if 


im [| | fn(@) — Fo)” duo) = 0 (7.20) 
applies. 


We will show that the mean square convergence ensures that integration and limit 
can be interchanged. For this we concentrate again on a probability measure, i.e., 
we consider random variables. We use the definition of mean square convergence 
and rely on the identity (5.36). Assume lim, fn = f. Thus we get from (7.20) 


0= Jim | Into - Flo)? dno) 


= lim (Varlfa = f1+ E°L Sn — f1) 


= lim Var[fn — f]+ lim EPL fa — f]. (7.21) 
n>X noo 
Since neither of the two summands can be negative, both lim, Var[ fn — 
fl = 0 and lim% E [fa — fl = 0 apply. If the squared expectation is 
zero, im, EL fn — f] = 0 must hold. The expectation is linear, and therefore 
lim, Elfa] = ELf] is true. Thus im, Elfa] = Elim, fn]. That was 
what we had to show. 


7.4 Conditional Expectations Are Random Variables 


Finally, we want to draw the reader’s attention to an aspect of conditional expec- 
tations that was originated by Kolmogoroff.'* So far we have realized that a 
conditional expectation is a real number that refers to an event A (the condition).!? 
The expectation depends on this event A. If we choose a different event, a different 
expectation will usually result. Therefore, Kolmogoroff has proposed that the 
conditional expectation should be interpreted as a random variable.”" 

To understand this idea we need to remember how we had defined random 
variables. We wanted to perceive them as functions of elementary events. On page 
83 we have shown that a random variable X can be characterized as a function 


X:2Q>R (7.22) 


18 Andrei Nikolayevich Kolmogoroff (1903-1987), Russian mathematician. 
19See page 80 ff. 
20 See Kolmogoroff (1933), page 41 ff. 
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Table 7.1 States, cash flows 


ms wen |CF3ER |EICF3|F}] 
CF3, and conditional 


expectations of the cash flows sade 140 | =135 

in the binomial model of uud 130 

Example 5.6 udu 130 | 1275 
udd |125 
duu 130 ) 1275 
dud 125 
ddu 125 | -82.5 
ddd 40 

with its conditional expectation 

E[X|F]:Q — R (7.23) 


also being interpreted as a random variable. The following two examples will help 
to better understand this concept. 


Example 7.3 (Binomial Model) With Table 7.1 we refer to Example 5.6 from page 
83. While the first column of this table shows the states, the second column 
represents the cash flows CF3. The conditional expectation (at time t = 2) is given 
in the third column. 

The o-algebra F2 corresponds to the set of information that the decision- 
maker assumes today he will have available at the time t = 2. On the basis of 
this information the decision-maker forms his expectations. In Table 7.1 we have 
grouped by parentheses those states that cannot be discriminated at time t = 2. Let 
us call the combination of two such states a “box.” At time t = 2 he only knows 
which box he will be in but he cannot discriminate the states within the box. 


Example 7.3 demonstrates the following: if a specific elementary event œw is 
given, the event {w} and other elementary events are combined into a set A (the 
above-mentioned “box”). The set A contains only those elementary events that the 
decision-maker cannot discriminate from w on the base of his information set given. 
In this example he was able to observe the uu node at t = 2 but did not (yet) know 
whether the state uuu or uud will occur at t = 3. The conditional expected value 
E[X|F] assigns the actual number E[X|A] to the elementary event w. To determine 
the conditional expected values, the payments associated with the elementary events 
are weighted with their respective probabilities of occurrence. 


Example 7.4 (Share Price) To further deepen our reflections we consider a state 
space Q = [0, 1]. Each real number w e [0, 1] represents an elementary event. If we 
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Fig. 7.9 Illustration of the 
conditional expectation 


EIX|F] / 


a EIX|F] 


choose the Lebesgue measure?! à with the corresponding o-algebra, a probability 
space is generated since A(R) = 1 holds. 
Let us consider the random variable 


X(@) =o”. (7.24) 


With the elementary event w = 5 the random variable assumes the value X (w) = L. 
We present the path of this random variable in Fig. 7.9 as a dashed curve. 
Let us determine the conditional expectation for the following o -algebra 


-fe DHE en 0 


In this case the decision-maker cannot tell with certainty which specific elementary 
event w € [0, 1] is present; instead he receives only the information whether the 
elementary event is greater or less than r” This is all he knows. What is the 
conditional expectation of the random variable X? 


21 See page 53. 


22 For mathematical reasons, the second set in the o-algebra must be a half-open interval. If we 
would add the set [0, 3] to the o-algebra the intersection 


ee- va 


would also be measurable and the decision-maker could determine whether the state = 4 has 
occurred. But that would be more than we wanted to assume. 
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Concentrating on the first subinterval we get according to (5.37)”° a conditional 


expectation of 


1] 1f2 x37? 
E| X|jo < =|=— X* dio) = 2 | — (7.27) 
2 1 
ae 3 Jo 
and for the second subinterval 
Jaf x37! 
El Xlo > -|== | Xd) =2|— | . (7.28) 
2 1 jJi 3 Jı 
277 2 


Thus, we can present the conditional expectation simply by 


EIX|F] = (7.29) 


Figure 7.9 shows the form of the conditional expectation which is a constant 
function with a jump at wœ = 3. 

As before we recognize the idea of conditional expectation. Beginning with an 
elementary event w one must first determine the smallest set A which is part of the 
o-algebra F and also includes w. The conditional expectation E[X|A] is calculated 
using Eq. (5.37) and represents the value of the random variable E[X|¥ ] at æ. 

Finally, let us present the following rules for calculating for conditional expecta- 
tions. 


Expected value of known quantities If X e F (it is also said that X is F- 
measurable), then E[X|¥] = X applies. 
In order to illustrate the theorem imagine having to determine the conditional 
expectation of an uncertain quantity X (w). However, the situation is such that the 
uncertain state w can be derived directly from the observed value of the quantity 
X. Thus the observed quantity is not really uncertain, a result confirming the first 
theorem. 
Further, if Z is F -measurable and bounded, then E[Z- X|F'] = Z-E[X|F] holds. 
Linearity For any numbers a, b the following is true: E[aX + bY|F] = 
a E[X|F] + DEY |F]. 
Since the conditional expectation represents a generalization of the classic 
(unconditional) expectation, the property of linearity remains valid. That is the 
substance of this theorem. 


3 See page 83. 
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Monotonicity If X > 0, then E[X|F] > 0 applies. 

Since probabilities are nonnegative the expected value of nonnegative variables 
remains nonnegative. This applies to conditional expectations as well. 

Limit almost everywhere If X, is a monotonously growing sequence of random 

variables which converges to X almost everywhere and if X has a finite 
expectation, lim, oo E[X,|F ] = E[X|F ] holds. 
We had emphasized in Sect.7.3 that the interchangeability of limit and expec- 
tation is of considerable importance in probability theory. This is one of the 
strengths of the concept of conditional expectation. Under certain conditions 
limit and expectation can be swapped using almost everywhere-convergence. 

Iterated expectation IEF C G, then E[E[X|G]|F] = E[X|F]. 

If iterated conditional expectations are to be calculated the inner expectation 
E[X|G] can be omitted. 
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